System and method for software diagnostics using a combination of visual and dynamic tracing

ABSTRACT

A software system is disclosed that provides remote troubleshooting and tracing of the execution of computer programs. The software system allows a remote software developer or help desk person to troubleshoot computer environment and installation problems such as missing or corrupted environment variables, files, DLLs, registry entries, and the like. In one embodiment the software system includes an information-gathering module that gathers run-time information about program execution, program interaction with the operating system and the system resources. The information-gathering module also monitors user actions and captures screen output. The information-gathering module passes the gathered information to an information-display module. The information-display module allows a support technician (e.g., a software developer, a help desk person, etc.) to see the user interactions with the program and corresponding reactions of the system. In one embodiment, the information-display module allows the support technician to remotely view environment variables, file access operations, system interactions, and user interactions that occur on the user&#39;s computer and locate failed operations that cause execution problems

REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority benefit of ProvisionalApplication No. 60/186,636, filed Mar. 3, 2000, titled “SYSTEM ANDMETHOD FOR SOFTWARE DIAGNOSTICS USING COMBINATION OF VISUAL AND DYNAMICTRACING,” the disclosure of which is incorporated herein by reference inits entirety.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to software tools for assistingsoftware developers and help desk personnel in the task of monitoringand analyzing the execution of computer programs running on remotecomputers and detection and troubleshooting of execution problems.

[0004] 2. Description of the Related Art

[0005] The problem of ascertaining why a particular piece of software ismalfunctioning is currently solved by a number of techniques includingstatic analysis of configuration problems and conventional debuggingtechniques such as run-time debugging and tracing. Despite thesignificant diversity in software tracing and debugging programs(“debuggers”), virtually all debuggers share a common operational model:the developer notices the presence of a bug during normal execution, andthen uses the debugger to examine the program's behavior. The secondpart of this process is usually accomplished by setting a breakpointnear a possibly flawed section of code, and upon reaching thebreakpoint, single-stepping forward through the section of code toevaluate the cause of the problem.

[0006] Two significant problems arise in using this model. First, thedeveloper needs to know in advance where the problem resides in order toset an appropriate breakpoint location. Setting such a breakpoint can bedifficult when working with an event-driven system (such as theMicrosoft Windows® operating system), because the developer does notalways know which of the event handlers (callbacks) will be called.

[0007] The second problem is that some bugs give rise to actual errorsonly during specific execution conditions, and these conditions cannotalways be reproduced during the debugging process. For example, aprogram error that occurs during normal execution may not occur duringexecution under the debugger, since the debugger affects the executionof the program. This situation is analogous to the famous “Heizenbergeffect” in physics: the tool that is used to analyze the phenomenaactually changes its characteristics. The Heizenberg effect isespecially apparent during the debugging of time-dependent applications,since these applications rely on specific timing and synchronizationconditions that are significantly altered when the program is executedstep-by-step with the debugger.

[0008] An example of this second type of problem is commonly encounteredwhen software developers attempt to diagnose problems that have beenidentified by customers and other end users. Quite often, softwareproblems appear for the first time at a customer's site. When trying todebug these problems at the development site (typically in response to abug report), the developer often discovers that the problem cannot bereproduced. The reasons for this inability to reproduce the bug mayrange from an inaccurate description given by the customer, to adifference in environments such as files, memory size, system libraryversions, and configuration information. Distributed, client/server, andparallel systems, especially multi-threaded and multi-process systems,are notorious for having non-reproducible problems because these systemsdepend heavily on timing and synchronization sequences that cannoteasily be duplicated.

[0009] When a bug cannot be reproduced at the development site, thedeveloper normally cannot use a debugger, and generally must resort tothe tedious, and often unsuccessful, task of manually analyzing thesource code. Alternatively, a member of the software development groupcan be sent to the customer site to debug the program on the computersystem on which the bug was detected. Unfortunately, sending a developerto a customer's site is often prohibitively time consuming andexpensive, and the process of setting up a debugging environment (sourcecode files, compiler, debugger, etc.) at the customer site can beburdensome to the customer.

[0010] Some software developers attempt to resolve the problem ofmonitoring the execution of an application by imbedding tracing code inthe source code of the application. The imbedded tracing code isdesigned to provide information regarding the execution of theapplication. Often, this imbedded code is no more than code to printmessages which are conditioned by some flag that can be enabled inresponse to a user request. Unfortunately, the imbedded code solutiondepends on inserting the tracing code into the source prior to compilingand linking the shipped version of the application. To be effective, theimbedded code must be placed logically near a bug in the source code sothat the trace data will provide the necessary information. Trying toanticipate where a bug will occur is, in general, a futile task. Oftenthere is no imbedded code where it is needed, and once the applicationhas been shipped it is too late to add the desired code.

[0011] Another drawback of current monitoring systems is the inabilityto correctly handle parallel execution, such as in a multiprocessorsystem. The monitoring systems mentioned above are designed for serialexecution (single processor) architectures. Using serial techniques forparallel systems may cause several problems. First, the samplingactivity done in the various parallel entities (threads or processes)may interfere with each other (e.g., the trace data produced by oneentity may be over written by another entity). Second, the systems usedto analyze the trace data cannot assume that the trace is sequential.For example, the function call graph in a serial environment is a simpletree. In a parallel processing environment, the function call graph isno longer a simple tree, but a collection of trees. There is atime-based relationship between each tree in the collection. Displayingthe trace data as a separate calling tree for each entity is notappropriate, as this does not reveal when, during the execution,contexts switches were done between the various parallel entities. Thelocation of the context switches in the execution sequence can be veryimportant for debugging problems related to parallel processing.

[0012] Moreover, the computing model used in the Microsoft Windowsenvironment, which is based on the use of numerous sophisticated anderror-prone applications with many components interacting in a complexway, requires a significant effort for system servicing and support.Many Windows problems experienced by users are software configurationerrors that commonly occur when the users add new programs and devicesto their computers. Problems also occur due to the corruption ofimportant system files, resources, or setups. Another important sourceof software malfunctioning is “unexpected” user behavior that was notenvisioned by the software developers (as occurs when, for example, theuser inadvertently deletes a file needed by the application).

SUMMARY OF THE INVENTION

[0013] The present invention overcomes these and other problemsassociated with debugging and tracing the execution of computerprograms. The present invention provides features that allow a remotesoftware developer or help desk person to debug configuration problemssuch as missing or corrupted environment variables, files, DLLs,registry entries, and the like. In one embodiment, a “visual problemmonitor” system includes an information-gathering module that gathersrun-time information about program execution, program interaction withthe operating system and the system resources. The information-gatheringmodule also monitors user actions and captures screen output. In oneembodiment, file interactions, DLL loading and/or registry accesses aremonitored non-intrusively. In one embodiment, the relevant supportinformation captured by the information-gathering module is saved in alog file. The information-gathering module passes the gatheredinformation to an information-display module. In one embodiment, theinformation-gathering module attaches to the running program using ahooking process. The program being monitored need not be speciallymodified or adapted to allow the information-gathering module to attach.

[0014] The information-display module allows a support technician (e.g.,a software developer, a help desk person, etc.) to see the userinteractions with the program and corresponding reactions of the system.This eliminates the “questions and answers” game that support personneloften play with users in order to understand what the user did and whathappened on the user's PC. In one embodiment, the information-displaymodule allows the support technician to remotely view environmentvariables, file access operations, system interactions, and userinteractions that occur on the user's computer. In one embodiment, theinformation-display module allows the support technician to remotelyview crash information (in the event of a crash on the user's computer),system information from the user's computer, and screen captures fromthe user's computer.

[0015] One aspect of the present invention is a software system thatfacilitates the process of identifying and isolating bugs within aclient program by allowing a developer to trace the execution paths ofthe client. The tracing can be performed without requiring modificationsto the executable or source code files of the client program. In oneembodiment, the system interaction tracing can be performed even withoutany knowledge of the source code or debug information of the client.Preferably, the trace data collected during the tracing operation iscollected according to instructions in a trace control dataset, which ispreferably stored in a Trace Control Information (TCI) file. Typically,the developer generates the TCI file by using a trace options editorprogram having a graphical user interface. The options editor displaysthe client's source code representation on a display screen togetherwith controls that allow the software developer to interactively specifythe source code and data elements to be traced. The options editor mayuse information created by a compiler or linker, such as debuginformation, in order to provide more information about the client andthereby make the process of selecting trace options easier. Once thetrace options are selected, the client is run on a computer, and atracing library is used to attach to the memory image of the client (theclient process). The tracing library is configured to monitor executionof the client, and to collect trace data, based on selections in thetrace options. The trace data collected by the tracing library iswritten to an encoded buffer in memory. The data in the buffer mayoptionally be saved to a trace log file for later use.

[0016] The developer then uses a trace analyzer program, also having agraphical user interface, to decode the trace information into ahuman-readable form, again using the debug information, and displaystranslated trace information on the display screen to allow thedeveloper to analyze the execution of the client program. In a preferredembodiment, the trace options editor and the trace analyzer are combinedinto a single program called the analyzer. The analyzer is preferablyconfigured to run under the control of a multi-process operating systemand to allow the developer to trace multiple threads and multipleprocesses. The tracing library is preferably configured to run in thesame process memory space as the client thereby tracing the execution ofthe client program without the need for context switches.

[0017] In one embodiment, the software system provides a remote modethat enables the client program to be traced at a remote site, such asby the customer at a remote customer site, and then analyzed at thedeveloper site. When the remote mode is used, the developer sends theTCI file for the particular client to a remote user site together with asmall executable file called the tracing “agent.” The agent is adaptedto be used at the remote user site as a stand-alone tracing componentthat enables a remote customer, who does not have access to the sourcecode of the client, to generate a trace file that represents executionof the client application at the remote site. The trace file is thensent to the developer site (such as by email), and is analyzed by thesoftware developer using the analyzer. The remote mode thus enables thesoftware developer to analyze how the client program is operating at theremote site, without the need to visit the remote site, and withoutexposing to the customer the source code or other confidential detailsof the client program.

[0018] The software system also preferably implements an online modethat enables the software developer to interactively trace and analyzethe execution of the client. When the software system is used in theonline mode, the analyzer and agent are effectively combined into oneprogram that a developer can use to generate trace options, run andtrace the client, and display the trace results in near real-time on thedisplay screen during execution of the client program.

[0019] In one embodiment, the support technician typically uses adefault TCI file that allows the trace system to trace interactions andother important API functions without access to source code and/or debuginformation. This is useful for troubleshooting commercial applicationssuch Microsoft Office, Internet Information Server, CRM and ERP systems,and other legacy products and the like.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] A software system which embodies the various features of theinvention will now be described with reference to the followingdrawings.

[0021]FIG. 1A is a block diagram illustrating the use of the system tocreate a trace control information file.

[0022]FIG. 1B is a block diagram illustrating the use of the system inremote mode.

[0023]FIG. 1C is a block diagram illustrating the use of the system toanalyze a trace log file.

[0024]FIG. 2 is a block diagram illustrating the use of the system inonline mode.

[0025]FIG. 3A is an illustration of a typical main frame window providedby the system's trace analyzer module.

[0026]FIG. 3B is an illustration of a typical main frame window showingmultiple threads.

[0027]FIG. 4 illustrates a process list window that lists the processesto be traced.

[0028]FIG. 5 illustrates the trace options window that allows adeveloper to select the functions to be traced and the information to becollected during the trace.

[0029]FIG. 6 illustrates a file page window that provides a hierarchicaltree of trace objects listed according to hierarchical level.

[0030]FIG. 7 illustrates a class page window that provides ahierarchical tree of trace objects sorted by class.

[0031]FIG. 8 illustrates the process page window that provides ahierarchical tree that displays the traced process, and the threads foreach process.

[0032]FIG. 9 illustrates the running process window that allows the userto attach to and start tracing a process that is already running.

[0033]FIG. 10 illustrates the start process window that allows the userto load an executable file, attach to the loaded file, execute theloaded file, and start tracing the loaded file.

[0034]FIG. 11 shows a trace detail pane that displays a C++ class havingseveral members and methods, a class derived from another classes, andclasses as members of a class.

[0035]FIG. 12 illustrates a trace tree pane, showing a break (or tear)in the trace tree where tracing was stopped and then restarted.

[0036]FIG. 13 is a flowchart which illustrates the process of attachingto (hooking) a running process.

[0037]FIG. 14 is a flowchart which illustrates the process of loading anexecutable file and attaching to (hooking) the program.

[0038]FIG. 15 is a block diagram showing the architecture of the visualproblem monitor system including the information-gathering module andthe information-display module.

[0039]FIG. 16 shows a multi-window display provided by theinformation-display module.

[0040]FIG. 17 is a flowchart illustrating the use of the system to solvesoftware support problems.

[0041] In the drawings, like reference numbers are used to indicate likeor functionally similar elements. In addition, the first digit or digitsof each reference number generally indicate the figure number in whichthe referenced item first appears.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0042] The present invention provides a new model for softwarediagnostics by tracing the execution path of a computer program and userinteraction with the computer program. In the preferred embodiment ofthe invention, this tracing model is implemented within a set of tracingand debugging tools that are collectively referred to as the BugTrappersystem (“BugTrapper”). The BugTrapper tools are used to monitor andanalyze the execution of a computer program, referred to as a client.One feature of the BugTrapper is that it does not require specialinstructions or commands to be imbedded within the source code of theclient, and it does not require any modifications to be made to thesource or executable files of the client. “Tracing,” or “to trace,”refers generally to the process of using a monitoring program to monitorand record information about the execution of the client while theclient is running. A “trace” generally refers to the informationrecorded during tracing. Unlike conventional debuggers that usebreakpoints to stop the execution of a client, the BugTrapper toolscollect data while the client is running. Using a process called“attaching”, the BugTrapper tools instrument the client by insertinginterrupt instructions at strategic points defined by the developer(such as function entry points) in the memory image of the client. Thisinstrumentation process is analogous to the process of connecting alogic analyzer to a circuit board by connecting probes to test points onthe circuit board. When these interrupts are triggered, the BugTrappercollects trace information about the client without the need for acontext switch, and then allows the client to continue running.

[0043] The BugTrapper implementations described herein operate under,and are therefore disclosed in terms of, the Windows-NT/2000 andWindows-95/98 operating systems and the like. It will be apparent,however, that the underlying techniques can be implemented using otheroperating systems that provide similar services. Other embodiments ofthe invention will be apparent from the following detailed descriptionof the BugTrapper.

[0044] Overview of BugTrapper System and User Model

[0045] The BugTrapper provides two modes of use, remote mode, and onlinemode. As discussed in more detail in the following text accompanyingFIGS 1A-1C, using remote mode a developer can trace the remote executionof a program that has been shipped to an end user (e.g. a customer orbeta user) without providing a special version of the code to the user,and without visiting the user's site or exposing the source code leveldetails of the program to the user. The system can also be used in anonline mode wherein the developer can interactively trace a program andview the trace results in real time.

[0046] Remote Mode

[0047] Remote mode involves three basic steps shown in FIGS. 1A through1C. In step 1, shown in FIG. 1A, a developer 112 uses a program calledthe BugTrapper analyzer 106 to create a file called a trace controlinformation (TCI) file 120. The TCI file 120 contains instructions thatspecify what information is to be collected from a program to be traced(the client). The analyzer 106 obtains information about the client froma build (e.g., compile and link) by-product, such as a link map file,or, as in the preferred embodiment, a debug information file 121.Typically, the debug information file 112 will be created by a compilerand will contain information such as the names and addresses of softwaremodules, call windows, etc. for the specific client. The developer 112then sends the TCI file 120 and a small tracing application called theagent 104 to a user 110 as shown in FIG. 1B. The user 110 runs the agent104 and the client 102 and instructs the agent 104 to attach to theclient 102. The agent attaches to the client 102 by loading aclient-side trace library 125 into the address space of the client 102.An agent-side trace library 124 is provided in the agent 104. Theclient-side trace library 125 and the agent-side trace library 124 arereferred to collectively as the “trace library.” The agent-side tracelibrary 124 and the client-side trace library 125 exchange messagesthrough normal interprocess communication mechanisms, and through ashared memory trace buffer 105. The agent-side trace library 124 usesinformation from the TCI file 102 to attach the client-side tracelibrary 125 into the client 102, and thereby obtain the traceinformation requested by the developer 112.

[0048] The agent 104 and the client-side trace library 125 run in thesame context so that the client 102 can signal the client-side tracelibrary 125 without performing a context switch and thus withoutincurring the overhead of a context switch. For the purposes herein, acontext can be a process, a thread, or any other unit of dispatch in acomputer operating system. The client 102 can be any type of softwaremodule, including but not limited to, an application program, a devicedriver, or a dynamic link library (DLL), or a combination thereof. Theclient 102 can run in a single thread, or in multiple processes and/ormultiple threads.

[0049] In operation, the agent 104 attaches to the client 102 using aprocess known as “attaching.” The agent 104 attaches to the client 102,either when the client 102 is being loaded or once the client 102 isrunning. Once attached, the agent 104 extracts trace information, suchas execution paths, subroutine calls, and variable usage, from theclient 102. Again, the TCI file 120 contains instructions to theclient-side trace library 125 regarding the trace data to collect. Thetrace data collected by the client-side trace library 125 is written tothe trace buffer 105. On command from the user 110 (such as when a bugmanifests itself), the agent 104 copies the contents of the trace buffer105 to a trace log file 122. In some cases, the log data is written to afile automatically, such as when the client terminates. The user 110sends the trace log file 122 back to the developer 112. As shown in FIG.1C, the developer 112 then uses the analyzer 106 to view the informationcontained in the trace log file 122. When generating screen displays forthe developer 112, the analyzer 106 obtains information from the debuginformation file 121. Since the analyzer 106 is used to create the TCIfile 120 and to view the results in the trace log file 122, thedeveloper can edit the TCI file 120 or create a new TCI file 120 whileviewing results from a trace log file 122.

[0050] Remote mode is used primarily to provide support to users 110that are located remotely relative to the developer 112. In remote mode,the agent 104 is provided to the user 110 as a stand-alone componentthat enables the user to generate a trace log file that represents theexecution of the client. The TCI file 120 and the trace log file 122both may contain data that discloses secrets about the internaloperation of the client 102 and thus both files are written using anencoded format that is not readily decipherable by the user 110. Thus,in providing the TCI file 120 and the agent 104 to the user, thedeveloper 112 is not divulging information to the user that wouldreadily divulge secrets about the client 102 or help the user 110 in anattempt to reverse engineer the client 102. The Agent traces the clientwithout any need for modification of the client. The developer 112 doesnot need to build a special version of the client 102 executable fileand send it to the customer, neither does the customer need topre-process the client executable file before tracing.

[0051] From the perspective of the remote user, the agent 104 actsessentially as a black box that records the execution path of the client102. As explained above, the trace itself is not displayed on thescreen, but immediately after the bug reoccurs in the application, theuser 110 can dump the trace data to the trace log file 122 and send thisfile to the developer 112 (such as by email) for analysis. The developer112 then uses the analyzer 106 to view the trace log file created by theuser 110 and identify the problematic execution sequence. In remotemode, the user 110 does not need access to the source code or the debuginformation. The agent 104, the TCI file 120, and the trace log file 122are preferably small enough to be sent via email between the developer112 and the user 110. Further details regarding the remote mode ofoperation are provided in the sections below.

[0052] Online Mode

[0053] As shown in FIG. 2, the BugTrapper may also be used in an onlinemode rather than remote mode as shown in the previous figures. In thismode, the BugTrapper is used by the developer 112 to locally analyze aclient 102, which will typically be a program that is still beingdeveloped. For example, the online mode can be used as an aid during thedevelopment as a preliminary or complementary step to using aconventional debugger. In many cases it is hard to tell exactly where abug resides and, therefore, where breakpoints should be inserted. Onlinemode provides the proper basis for setting these breakpoints. Later, iffurther analysis is required, a more conventional debugger can be used.In online mode, the analyzer 106 is used to perform all of its normaloperations (e.g. creating the TCI file 120 and viewing the traceresults) as well as the operations performed by the agent 104 in remotemode. Thus, in online mode, the agent 104 is not used because it is notneeded. The developer 112 uses the analyzer 106 to run the client 102and attach the client-side trace library 125 to the client 102. Inonline mode, the analyzer 106 reads the trace buffer 105 in nearreal-time to provide near real-time analysis functionality. In theonline mode, the analyzer 106 immediately displays the trace informationto the developer 112.

[0054] The developer 112 uses the analyzer 106 to interactively createtrace control information (TCI). The TCI may be sent to the client-sidetrace library 125 via file input/output operations or throughconventional inter-process communication mechanisms such as sharedmemory, message passing or remote procedure calls. The TCI indicates tothe client-side trace library 125 what portions of the client 102 totrace, and when the tracing is to be performed. As the client program102 runs, the client-side trace library 125 collects the traceinformation and relays the information back to the analyzer 106, whichdisplays the information in near real-time within one or more windows ofthe BugTrapper.

[0055] Operational Overview of the Tracing Function

[0056] Regardless of which operational mode is used (online or remote),the client 102 is run in conjunction with the client-side trace library125. As described in detail below, the client-side trace library 125 isattached to the in-memory image of the client 102 and generates traceinformation that describes the execution of the client 102. The TCI file120, provided by the developer 112, specifies where tracing is to takeplace and what information will be stored. Because the client is tracedwithout the need for context switches, the effect of this tracingoperation on the performance of the client 102 is minimal, so that eventime-dependent bugs can be reliably diagnosed. As described below, thisprocess does not require any modification to the source or object codefiles of the client 102, and can therefore be used with a client 102that was not designed to be traced or debugged.

[0057] The analyzer 106 is used to analyze the trace data and isolatethe bug. The developer 112 may either analyze the trace data as it isgenerated (online mode), or the developer 112 may analyze trace datastored in the trace log file 122 (mainly remote mode). As describedbelow, the assembly level information in the trace log file is convertedback to a source level format using the same debug information used tocreate the TCI file 120. During the trace analysis process, the analyzer106 provides the developer 112 with execution analysis options that aresimilar to those of conventional debuggers, including options for singlestepping and running forward through the traced execution of the client102 while monitoring program variables. In addition, the analyzer 106allows the developer 112 to step backward in the trace, and to searchfor breakpoints both in the future and in the past.

[0058] The attaching mechanism used to attach the client-side tracelibrary 125 to the client 102 involves replacing selected object codeinstructions (or fields of such instructions) of the memory image of theclient 102 with interrupt (INT) instructions to create trace points. Thelocations of the interrupts are specified by the TCI file 122 that iscreated for the specific client 102. When such an interrupt instructionis executed, a branch occurs to the tracing library 125. The client-sidetrace library 125 logs the event of passing the trace point location andcaptures pre-specified state information, such as values of specificprogram variables and microprocessor registers. The instructions thatare replaced by the interrupt instructions are maintained within aseparate data structure to preserve the functionality of theapplication.

[0059] Overview of the Analyzer User Interface

[0060] The analyzer 106 comprises a User Interface module that readstrace data, either from the trace buffer 105 (during on-line modetracing) or from the trace log file 122 (e.g. after remote tracing) anddisplays the data in a format, such as a trace tree, that shows thesequence of traced events that have occurred during execution of theclient 102. Much of the trace data comprises assembly addresses. Withreference to FIG. 1C, the analyzer 106 uses the debug information 121 totranslate the traced assembly addresses to comprehensive strings thatare meaningful to the developer. In order to save memory and gainperformance, this translation to strings is preferably done only for theportion of the trace data which is displayed at any given time, not thewhole database of trace data. Thus, for example, in formatting a screendisplay in the user interface, only the trace data needed for thedisplay in the user interface at any given time is read from the logfile 122. This allows the analyzer 106 to display data from a trace logfile 122 with more than a million trace records.

[0061] The debug information 121 is preferably created by a compilerwhen the client is compiled. Using the debug information 121 theanalyzer translates function names and source lines to addresses whencreating the TCI file 120. Conversely, the analyzer 106 uses the debuginformation 121 to translate addresses in the trace data back intofunction names and source lines when formatting a display for the userinterface. One skilled in the art will recognize that other buildinformation may be used as well, including, for example, information ina linker map file and the Type Library information available in aMicrosoft OLE-compliant executable.

[0062] Preferably, the debug information is never used by the tracelibraries 124, 125 or the agent 102, but only by the analyzer 106. Thisis desirable for speed because debug information access is typicallyrelatively slow. This is also desirable for security since there is noneed to send to the user 110 any symbolic information that mightdisclose confidential information about the client 102.

[0063] The analyzer 106 allows the developer 112 to open multiple tracetree windows and define a different filter (trace control instructions)for each of window. When reading a trace record, each window filter ispreferably examined separately to see if the record should be displayed.The filters from the various windows are combined in order to create theTCI file 120, which is read by the client-side trace library 125. Inother words, the multiple windows with different filters are handled bythe User Interface, and the client-side trace library 125 reads from asingle TCI file 120.

[0064]FIG. 3A is an illustration of a typical frame window 300 providedby the analyzer 106. The analyzer frame window 300 displays similarinformation both when performing online tracing (online mode) and whendisplaying a trace log file (remote mode). The frame window 300 is asplit frame having four panes. The panes include a trace tree pane 310,an “executable” pane 314, a trace detail pane 316, and a source pane318. The analyzer frame 300 further provides a menu bar 304, a dockabletoolbar 306, and a status bar 312. The menu bar 304 provides drop-downmenus labeled “File,” “Edit,” “View,” “Executable,” and “Help.” Thetrace tree pane 310 contains a thread caption bar 320, described belowin connection with the Analyzer. Below the thread caption bar 320 is atrace tree 330. The trace tree 330 is a hierarchical tree control thatgraphically displays the current trace information for the executionthread indicated in the thread caption bar 320. The trace tree 330displays, in a hierarchical tree graph, the sequence of function callsand returns (the dynamic call tree) in the executable programs(collectively the client 102) listed in the executable pane 314. Tracedsource lines also appear in the trace tree, between the call and returnof the function in which the lines are located. FIG. 3 illustrates asingle thread header and thread tree combination (the items 320 and330). However, multiple thread captions and thread tree combinationswill be displayed when there are context switches between multiplethreads or processes.

[0065] The executable pane 314 displays an “executable” listbox 361.Each line in the executable listbox 361 displays information about anexecutable image that is currently being traced. Each line in the listbox 361 displays a filename field 360, a process id (PID) field 362, anda status field 364. Typical values for the status field 364 include“running,” “inactive,” and “exited.” The trace detail pane 316 containsa trace detail tree 350, which that is preferably implemented as aconventional hierarchical tree control. The trace detail tree 350displays attributes, variables such as arguments in a function callwindow, and function return values of a function selected in the tracetree 330. The source pane 318 displays a source listing of one of thefiles listed in the source listbox 361. The source listing displayed inthe source pane 318 corresponds to the source code of the functionselected in the trace tree 330 of to the selected source line. Thesource code is automatically scrolled to the location of the selectedfunction.

[0066] The frame window 300 also contains a title bar which displays thename of the analyzer 106 and a file name of a log or Trace ControlInformation (TCI) file that is currently open. If the current file hasnot yet been saved, the string “-New” is concatenated to the file namedisplay.

[0067] The status bar 312 displays the status of the analyzer 106 (e.g.Ready), the source code file containing the source code listed in thesource code pane 318, and the line and column number of a current linein the source pane 318.

[0068] The toolbar 306 provides windows tooltips and the buttons listedin Table 1.

[0069]FIG. 3B shows a typical frame window 300 with multiple threads inthe trace tree pane 310. FIG. 3B shows a separate trace tree for eachthread and a thread caption bar (similar to the thread caption bar 320shown in FIG. 3A) for each thread. TABLE 1 Buttons on the toolbar 306Menu Button Equivalent Key Description “Open” File | Open Ctrl+O Opensan existing Trace Control Information file. “Save” File | Save Ctrl+SSaves the current Trace Control Information to a file. “Clear” Edit |Clear Clears the Trace Tree pane, the All Trace Detail pane, and theSource pane. “Find” Edit | Find Ctrl+F Finds a specific string in theexecutable source code or trace tree. “Bookmark” Edit | Adds or deletesa bookmark for Bookmark the currently selected function, or edits thename of an existing bookmark. “Window” View | New Opens a new instanceof the Window analyzer. “Start/Stop” Executable | Starts or stopstracing the Start/Stop executables listed in the Trace Executable pane.“Add” Executable | Ins Adds an executable to the Add Executable pane,without running it, so that it can be run and traced at a later date.“Run” Executable | F5 When the <New Executable> Run string is selected,adds an executable to the Executable pane, starts this executable andbegins tracing. When an executable which is not running is selected inthe Executable pane, starts this executable and begins tracing. “Attach”Executable | When the <New Executable> Attach string is selected,attaches a running executable to the Executable pane and begins tracing.When an executable that is not traced is selected, attaches the runningprocess of this executable, if it exists. “Terminate” Executable |Terminates the executable Terminate currently selected in the Executablepane. “Options” Executable | Opens the Trace Options Trace Optionswindow in which you can specify the elements that you want to trace forthe selected executable.

[0070] Using the Analyzer to Create the TCI File

[0071] The TCI file 120 specifies one or more clients 102 and thespecific elements (functions, processes and so on) to be traced eitherin online or remote mode. The TCI information is specified in a traceoptions window (described in the text associated with FIG. 5). The TCIfile 120 is used to save trace control information so that the sametrace options can be used at a later time and to send trace controlinformation to a user 110 to trace the client 102. The subsections thatfollow provide a general overview of selecting trace information for aTCI file 120 and descriptions of various trace options, different waysto access the trace options, and how to use the trace options to specifyelements to be traced.

[0072] The TCI file 120 for a client 102 is interactively generated bythe software developer 112 using the analyzer 106. During this process,the analyzer 106 displays the source structure (modules, directories,source files, C++ classes, functions, etc.) of the client 102 using thesource code debug information 121 generated by the compiler duringcompilation of the client 102. As is well known in the art, such debuginformation 121 may be in an open format (as with a COFF structure), orproprietary format (such as the Microsoft PDB format), and can beaccessed using an appropriate application program interface (API). Usingthe analyzer 106, the developer 112 selects the functions and sourcecode lines to be traced. This information is then translated intoaddresses and instructions that are recorded within the TCI file. Inother embodiments of the invention, trace points may be added to thememory image of the client 102 by scanning the image's object code “onthe fly” for specific types of object code instructions to be replaced.

[0073] Trace control information is defined for a specific client 102.In order to access the trace tool, the developer 112 first adds thedesired programs 110 to the list of executables shown in the executablepane 314 shown in FIG. 3. The executable is preferably compiled in amanner such that debug information is available. In many developmentenvironments, debug information may be included in an optimized“release” build such that creation of the debug information does notaffect the optimization. In a preferred embodiment, the debuginformation is stored in a PDB file. If during an attempt to add theexecutable to the Executable pane 314 a PDB file is not found by theanalyzer 106, the developer 112 is prompted to specify the location ofthe PDB file. Once an executable has been added to the Executable pane314, the developer 112 can set the trace control information using theavailable trace options described below.

[0074] To use the online mode to trace an executable 314 that is notcurrently running, the developer selects an executable file to run asthe client 102. To run an executable file, the developer 112double-clicks the <New Executable> text 365 in the executable pane 314to open a file selection window thus allowing the developer 112 toselect the required executable. Alternatively, the developer 112 canclick the Run button on the toolbar 306, or select the Run option fromthe “Executable” menu after selecting the <New Executable> text. Thefile selection window provides a command line arguments text box toallow the developer 112 to specify command line arguments for theselected executable file.

[0075] After selecting an executable to be a client 102 a trace optionswindow (as described below in connection with FIG. 5.) is displayedwhich allows the developer 112 to specify which functions to trace.After selecting the desired trace options and closing the trace optionswindow, the executable starts running and BugTrapper starts tracing. Asthe client 102 runs, trace data is collected and the trace data areimmediately displayed in the analyzer frame window 300 as shown in FIG.3.

[0076] To cause the analyzer 106 to trace an executable that iscurrently running, the developer 112 may click the “Attach” button onthe toolbar 306 after selecting the <New Executable> text. Upon clickingthe “Attach” button on the toolbar 306, a process list window 400 isdisplayed, as shown in FIG. 4. The process list window 400 displayseither an applications list 402 or a process list (not shown). Oneskilled in the art will understand that, according to the Windowsoperating system, an application is a process that is attached to a toplevel window. The applications list 402 displays a list of all of theapplications that are currently running. The process list window 400also provides a process list, which is a list of the processes that arecurrently running. The applications list 402 is selected for display byan applications list tab and the process list is selected for display bypressing the applications list tab. To select a process from the processlist window, the developer 112 clicks the Applications tab or theProcesses tab as required, and then selects the application or processto be traced. The process list window 400 also provides a refresh buttonto refresh the application list and the process list, and an OK buttonto close the process list window 400.

[0077] After the developer 112 selects an application or process usingthe process list window 400, and closes the process list window 400, theanalyzer 106 displays a trace options window 500, as shown in FIG. 6below. The application or process selected in the process list window400 becomes the client 102. The analyzer 106 can display trace data formultiple processes and applications (multiple clients); however, for thesake of simplicity, the operation of the analyzer 106 is described belowprimarily in terms of a single client 102. The trace options window 500allows the developer 112 to select the functions to be traced. Selectingtrace options is described below in the text in connection with FIG. 5.After selecting trace options and closing the trace options window 500,the client-side trace library 125 is attached to the client 102, and theclient 102 continues to run. The client-side trace library 125thereafter collects trace information that reflects the execution of theclient 102 and sends the trace information to the analyzer 106 fordisplay.

[0078] The developer can also add an executable file (e.g. a windows.exe file) to the executable pane 314 without actually running theexecutable file. To add an executable that is not currently running (andwhich is not to be run yet) to the executable pane 314, the developer112 selects the <New Executable> text 365 and then clicks the Add buttonon the toolbar 306, whereupon a file selection window is displayed. Thedeveloper 112 uses the file selection window to select the desiredexecutable and closes the file selection window. The file selectionwindow provides a text field to allow the developer to enter commandline arguments for the executable. Upon closing the file selectionwindow, the trace options window 500 is displayed which enables thedeveloper 112 to select the functions to trace. After selecting traceoptions and closing the trace options window, the selected executable isinserted into the Executable pane 314 with the status “Inactive.” Thedeveloper can then begin a trace on the inactive executable by selectingthe executable in the executable pane 314 and clicking the “Run” or“Attach” buttons on the toolbar 306.

[0079] In a preferred embodiment, the developer 112 can only create anew TCI file 120 when the executable list 361 contains the names of oneor more executable files. To create a TCI file 120, the developer 112selects “Save” from the “File” menu. The developer can also open apreviously saved TCI file 120 and then modify the TCI file 120 using thetrace options window 500. Once a TCI file 120 has been created (oropened) the developer 112 can select an executable from the executablepane and click the “Run” or “Attach” button from the toolbar to starttracing.

[0080]FIG. 5 illustrates the trace options window 500. The trace optionswindow 500 is divided into two panes, a filter tree pane 501 and asource code pane 504. The filter tree pane 501 is a multi-page panehaving four pages: a file page 602 which is selected by a file tab 510;a class page 702 which is selected by a class tab 512; a name page 502which is selected by a name tab 514; and a process page 802 which isselected by a process tab 516. The name page 502 is shown in FIG. 5. Thefile page 602 is shown in FIG. 6, the class page 702 is shown in FIG. 7,and the process page 802 is shown in FIG. 8. The trace options windowalso provides an “advanced” button 520 and an “add DLL” button 522.

[0081] The trace options window 500 allows the developer 112 to specifywhich functions to trace and what to display in the trace tree 330. Thetrace options window 502 allows the developer 112 to filter outfunctions which have already been traced. These functions will beredisplayed where they were traced if they are later re-select fortracing. If a function is not selected for tracing in the trace optionswindow 500, it will not be displayed in the trace tree 330. If afunction that was not traced is filtered in again, it will not appear inthat portion of the information that has already been displayed.

[0082] For example, consider the following C++ program: f1 ( ) { } f2 () { } main ( ) { while (1) { getchar (c) ; f1 ( ) ; f2 ( ) ; } }

[0083] Using the above program as an example of a client 102, andassuming that the user forms the following steps:

[0084] 1. Select the functions f1, f2, and main for tracing in the traceoptions window 500.

[0085] 2. Execute one loop and view the resulting trace.

[0086] 3. Deselect (filter out) f2 for tracing in the Trace Optionswindow 500.

[0087] 4. Execute the loop again.

[0088] 5. Re-select (filter in) f2 for tracing in the Trace Optionswindow.

[0089] 6. Execute the loop once more.

[0090] Then, after Step 4 the following depicts the elements that aredisplayed in the trace window, with the symbol ˜˜˜representing a tear inthe trace as described below in connection with FIG. 12. $\begin{matrix}{\left. \begin{matrix}{main} \\{f1}\end{matrix} \right.\sim\sim\sim\sim {f1}} & \left( {{Step}\quad 3} \right)\end{matrix}$

[0091] After Step 6 the trace appears as follows: $\begin{matrix}{\left. \begin{matrix}{main} \\{f1} \\{f2}\end{matrix} \right.\sim\sim\sim\sim} & \left( {{Step}\quad 4} \right) \\{\left. {f1} \right.\sim\sim\sim\sim \begin{matrix}{f1} \\{f2}\end{matrix}} & \left( {{Step}\quad 5} \right)\end{matrix}$

[0092] In the above example, after f2 was filtered in again in step 5,it was restored in the first portion of the trace because filtering outoccurred after this portion had already been executed. However, f2 neverreturned to the second portion, which was executed after f2 had beenfiltered out. Therefore, changing the trace options also determineswhich of the functions that have already been traced will be displayed.If a traced function is then filtered out from the trace, it can laterbe filtered in again.

[0093] In the filter tree pane 501, the process tab 516, correspondingto the process page 802, is not displayed prior to activating a process.Each of the four pages in the filter tree pane 501 displays a tree thatthe developer 112 can use to select the functions to be traced andanalyzed. The source code pane 504 displays a source code fragment thatcontains the source code for the selected function and enables thedeveloper 112 to select the specific source lines to be traced. Eachline of executable source in the source code pane 504 is provided with acheck box displayed along the left edge of the pane 504. The developer112 checks the box to select the corresponding source line for tracing.

[0094] The “advanced” button 520 opens a window which allows thedeveloper 112 to specify which elements to display during the trace(e.g. arguments, pointers, “this” class members and return values) andthe maximum string length to be traced. The add DLL button 522 opens awindow which allows the developer 112 to specify DLL files to be traced.This is useful when a DLL is loaded dynamically, as described below.

[0095] The developer 112 uses the filter tree pane 501 to selectfunctions to be traced. Four page selection tabs at the top of thefilter tree pane 501 enable the developer 112 to view the functionsclassified (sorted) according to file (on the file page 602), class (onthe class page 702), name (on the name page 502) or process (on theprocess page 802). The way the functions are organized is different foreach classification tab. However, the tree structure that is displayedin each of the four pages operates in the same way, even though the dataelements in the tree are different for each page. Thus, the followingdiscussion relating to the filter tree applies to any of the four pagesof the filter tree pane 502.

[0096] The filter tree is a tree of function names with check boxes tothe left of each name. Function check boxes appear blank, checked ordimmed as follows:

[0097] Blank: No sub-element of this branch is checked.

[0098] Checked: All sub-elements of this branch are checked.

[0099] Dimmed: Some (but not all) sub-elements of this branch arechecked.

[0100] The developer 112 uses the check boxes to selected the functionsto trace and then closes the trace options window by clicking an OKbutton.

[0101] The file page 602, shown in FIG. 6, provides a hierarchical treethat lists the objects according to their hierarchical level in thefollowing order: + The Process that is traced. + The executable and DLLfiles which comprise the process. + Static Libraries + Source filedirectories. + Source files residing in these directories. + Classescontained in each source file and functions in each source file that donot belong to any class. + Functions belonging to the classes.

[0102] The source file structure is taken from the debug information(e.g., .PDB) files 121 for the client 102. If the full path name of thesource file is not contained in the .PDB file, then the functionscontained in that source file are located in a separate branch of thetrace tree 330 under the title <Unknown Directory>. Functions that areincluded in the .PDB file, but whose source file is unknown, are locatedin a separate branch of the trace tree 330 under the title <UnknownSource File>.

[0103] The class page 702, shown in FIG. 7, provides a hierarchical treethat lists the trace objects sorted by class, ignoring theirdistribution amongst source files. Functions, which do not belong to aspecific class are located in a separate branch of the trace tree 330under the title <Functions>. The name page 502, shown in FIG. 5,provides a hierarchical tree that lists functions sorted alphabeticallyby name. Leading underscores and class names for methods are ignored.The process page 802, shown in FIG. 8, provides a hierarchical tree thatdisplays each process that has been selected for tracing. Under eachprocess is a list of the threads for that process.

[0104] DLL files that are not linked with the executable but rather areloaded dynamically (e.g. libraries loaded using the LoadLibrary systemcall), are not shown by default in the trace options window 500. Inorder to trace a dynamically loaded DLL file, the dynamically loaded DLLfile should be added to the list of DLL files using the Add DLL button522 in the Trace Options window 500. Clicking the add DLL button 522displays a file selection window. Using the file selection window, thedeveloper 112 then selects the required DLL file. The selected DLL fileis added to the filter tree in the filter tree pane 502 of the traceoptions window 500.

[0105] The BugTrapper can also trace DLL files loaded by an executable,even when the executable does not contain debug information. Forexample, if the developer 112 writes a DLL file as an add-on (e.g., anActiveX control) to a commercial program (e.g. Microsoft InternetExplorer), the developer 112 can activate BugTrapper on the commercialprogram and perform a trace on the add-on.

[0106] The BugTrapper also allows the developer 112 to specify variousfunction attributes to be displayed in the trace detail pane 316 of theanalyzer frame window 300, (shown in FIG. 3) while performing a trace.The developer 112 can choose to display arguments, pointers, “this”class members and return values. One skilled in the art will recognizethat under the terminology of C++, a “this” class member is a classmember that is referenced by the C++ “this” pointer. The developer 112can also specify the maximum string length to be displayed. Selectingmore options generally reduces the number of records in the trace logfile and thus reduces the amount of execution time that is logged. Thediscussion below regarding the cyclic trace buffer provides furtherdetails of how much execution time is actually logged. The advancedbutton provides access to an advanced options window (not shown).

[0107] Selecting the arguments checkbox causes function arguments to bedisplayed in the trace detail pane 316. Selecting the “pointers”checkbox causes data to which a first level function argument of thepointer type points to be displayed. In other words, selecting thepointers checkbox causes function arguments that are pointers to bede-referenced for the display. The developer 112 may select the “this”checkbox to have “this” to have all members in a class displayed in thetrace detail pane 316 when there is a call to a method which has a thispointer. The developer 112 may select the return checkbox to havefunction return values displayed in the trace detail pane 316.

[0108] The BugTrapper also allows the developer 112 to control tracingof specific source lines. In the source code pane 504, a checkbox islocated to the left of each executable source line, which can be traced.To view the source code fragment containing a specific function, thedeveloper 112 selects the required function in the filter tree pane 502and the analyzer 106 displays the appropriate source code fragment inthe source code pane 504. If analyzer cannot locate the source code,then the source code is not displayed and the developer 112 may pressthe spacebar or right-click in the source code pane 504 and select a“Source Location” command from a pop-up menu. The “Source Location”command opens a dialog box which allows the developer 112 to specify asource code file to be displayed in the source code pane 504. Theappropriate source code is then displayed in the source code pane 504,as shown in FIG. 5.

[0109] To select the source code lines to trace, the developer clicksthe check boxes corresponding to the desired lines. To select multiplelines, the developer 112 can either press CTRL+A to select the wholesource code file, or drag the mouse along several lines and therebyselect a group of lines. The developer 112 can then click on anycheckbox in the selected area to check all the selected lines or clickon a checkbox that is already checked to deselect all selected thelines. If lines of code in a file are selected for tracing, then thefilename is displayed in blue. The developer 112 may also select whichvariables (e.g., local variables, global variables, static variables,etc.) should be traced for each traced line.

[0110] If a client 102 is modified and recompiled, it may not bedesirable to use an existing TCI file for that client 102 (for example,a function that was selected for tracing may have been from the modifiedand recompiled version). Whenever the BugTrapper encounters an outdatedTCI file 122, it issues a warning and then continues to trace based on aheuristic algorithm, which attempts to match the trace instructions tothe modified client executable. Trace information for an applicationthat may be recompiled at some future date can be supplemented by savingthe trace information to an Extended Trace Control Information (TCE)file rather than a regular TCI file 120. The TCE file contains extrasymbolic information (such as function names) that is not part of aregular TCI file 120. Using the extra symbolic information greatlyincreases the chances that the heuristic trace algorithm will producethe desired results. It is especially desirable to use a TCE file at theuser 102 site when the client 102 is frequently modified, and thedeveloper 112 does not want to redefine the trace options after eachcompilation. The TCE file is identified by a .TCE extension.

[0111] The developer may save a TCI file 120 by clicking the save buttonon the toolbar 306, whereupon the trace control information is saved.The first time that information is saved to a new TCI file 120, a fileselection window appears. In the file selection window, the developer112 may select the type of file (TCI or TCE) in a “Save as” type box.

[0112] The TCI file 120 can be used to trace a local client 102 at alater time, or it can be sent to a user 110 for use with the agent 104to trace a client 102 at a remote site. In a preferred embodiment, forremote tracing, the developer 112 sends the user 110 a self-extractingzip file that contains the agent 104 and the TCI file 120.

[0113] Using the Agent

[0114] As described above, the agent 104 is an executable module whichthe developer 112 can provide to a user 110 along with a Trace ControlInformation (TCI) file in order to trace a client 102. The trace datacollected by the agent 104 are written to the trace log file 122 whichthe user sends to the developer 112. The developer 112 uses the analyzer106 to view the contents of the trace log file and analyze the traceinformation in the log file 122. Trace analysis using the analyzer 106is discussed in subsequent sections of this disclosure. The presentsection discusses the procedures for starting the agent 104, includingthe first step performed by the user 110 to run the agent 104. Thepresent section also discloses techniques for selecting the TCI file120, specifying a directory for the trace log file 122, specifying theclient 102, and, finally, using the agent 104 to control the logging oftrace data. The agent 104 is an easy-to-run standalone application, withstep-by-step instructions provided on the screen. To trace anapplication, the user 102 needs both the agent 104 and the TCI file 120.The TCI file 120 is prepared, as described above, by the developer 112and contains information about the client 102 and the specific functionsto be traced.

[0115] In a preferred embodiment, the developer supplies the agent 104as a self extracting zip file that can be installed by simply doubleclicking on the zip file name. At the end of the installation, the user110 can launch the agent 102. When the agent 102 is launched, itdisplays a TCI select window (not shown) which is a conventional fileselect dialog that allows the user to select the TCI file 120. Likewise,the agent 104 provides a log file window, which allows the user 110 toselect a directory for the log file 122. The default log file is thelast log file that was opened by the agent 104. The next step in usingthe agent 104 is to specify the client 102 executable(s) to trace.

[0116] If an executable specified in the TCI file 120 is alreadyrunning, an attach to running processes window (running window) 900 isdisplayed, as shown in FIG. 9. The running window 900 provides a finishbutton 902, a cancel button 904, a back button 906, and a list ofprocesses 908. The list of processes 908 shows any currently runningprocesses that are specified in the TCI file 120. The list of processes908 shows all processes that are specified in the TCI file 120 that arenot currently running as disabled (grayed). The running window 900allows the user 102 to select a currently running process to trace byselecting items in the list 908. Preferably, the user 110 will deselectany executables that are to be re-run from the start (that is, when theuser does not want to attach to an executable that is already running).To select a running process, the user 110 selects a process from thelist 908, and then presses the finish button 902 to cause the BugTrapperto attach to the client processes and starts to collect trace data.

[0117] If an executable specified in the TCI file is not currentlyrunning, then a start processes window (start window) 1000 is displayed,as shown in FIG. 10,. The start window 1000 provides a finish button1002, a cancel button 1004, a back button 1006, and a list of executablefiles 1010. The start window 1000 also provides a path field 1012, aparameters field 1014, and a directory field 1016. The list of files1010 shows any currently running processes that are specified in the TCIfile. The start window 1000 allows the user to specify executables thatare not currently running to be traced. The agent 104 will run theselected client(s) 102 and trace them according to the information inthe TCI file 120.

[0118] The file list 1010 displays the executables, which are listed inthe TCI file. Each item in the file list 1010 is provided with a checkbox. To specify the executables to run, the user 102 checks the boxesfor the desired executables in the file list 1010. If the file path inthe file list 1010 is not correct, then the user may enter the correctfile path in the path field 1012. The user 110 may also add command linearguments in the parameters field 1014. The file path and command linesteps may be repeated as needed to specify the file path and commandsfor additional executables. When the finish button 1002 is clicked, anagent window (described below) is displayed and the agent 104 runs thespecified executables, attaches to the executable processes, and startstracing.

[0119] The agent window (not shown) is displayed by the agent 104. Theagent window displays the names of the TCI file and the log file. Theagent window also contains an animated icon whose movement indicateswhether trace data is currently being collected while the client 102 isrunning. The agent window also contains: a “Start/Stop” button to startor stop the trace; a “Clear” button to clear the trace buffer 105, a“Dump” button to save the contents of trace buffer 105 to the log file;and an “Exit” button to exit the agent 104.

[0120] The “Stop/Start” button allows the user 110 to stop and restarttracing when desired. Stopping the trace may improve system performance.The “Start/Stop” button toggles between Stop and Start according to thetracing status. The animated icon moves when tracing is in progress. The“Clear” button allows the user 110 to clear the trace buffer 105. Thecleared information is not stored in the log file 122 unless the userfirst uses the dump button. The dump button allows the user 110 to savethe contents of the trace buffer 105 to the log file 122. On the firstsave after a new process had been started, the agent 104 overwrites theold log file 122 (if one exists). On subsequent saves, new informationwill be appended to the existing log file 122. Clicking the exit buttoncauses the agent 104 to exit. Upon exiting, the trace buffer is writtento the log file. Note that the trace information is written to the logfile when either dump or exit is clicked and also when the tracedapplication crashes or terminates. The user 110 will preferably use thedump button frequently if it appears likely that the entire operatingsystem may crash.

[0121] In one embodiment, the user may select to write every trace lineto the disk as it is traced, or, the user may select to write tracelines periodically every N seconds. Such writing is useful, for example,when it appears likely that the entire operating system may crash.

[0122] Analysis of the Trace Information

[0123] The analyzer 106 is used to analyze a trace, either online as anapplication runs or off-line using a remote trace log. The generaltopics that fall under the rubric of trace analysis include, starting anonline trace, opening a previously saved trace log file, viewing traceinformation, interpreting the trace information, working with traceinformation, and additional trace functions that are available whenperforming an online trace.

[0124] The BugTrapper allows the developer 112 to trace a client 102executable in order to pinpoint an element in the client 102 code thatcauses a bug. The primary device for displaying trace information in theanalyzer 106 is the trace tree 330 in the trace tree pane 310 shown inFIG. 3. The trace control information (TCI) filters can be modifiedduring trace analysis to filter out some of the available trace dataaccording to the needs of the developer 112.

[0125] Analysis of a remote trace (or a previously saved online trace)is started by opening a previously saved trace log file and displayingthe trace information that it contains in the trace tree pane 310. Thelog file 122 may either have been created by saving trace informationusing the analyzer 106, or the log file 122 may have been created at aremote location using the agent 104. A trace log file 122 is opened byusing an “Open Log” command from the “File” pull down menu found on themenu bar 304. Once a trace log file 122 is open, the title bar 302displays the name and path of the opened log file 122. Once a trace logfile 122 is open, the developer can view the trace information usingvarious panes in the analyzer frame window 300. Trace information isdisplayed in the trace tree pane 310, the trace detail pane 316, and thesource pane 318.

[0126] The trace tree 330, in the trace tree pane 310, is a hierarchicaltree showing trace data collected from the client 102. Trace dataincludes information about events that took place during execution ofthe client 102, including function calls, function returns, selectedsource lines, etc. The developer 112 can use the mouse to choose anyfunction from the trace tree, whereupon the arguments and return valuesfor the chosen function are shown in the trace detail pane 316, and thesource for the chosen function is displayed in the source pane 318. Thetypes of trace information displayed for both online traces and a tracefrom log files is almost identical, however the log file trace providesa static display, while the online trace is dynamic and can be viewed asthe trace information is being collected.

[0127] The trace tree 330 displays a hierarchical tree of the sequenceof function calls and returns in the client 102. The number of lines inthe trace tree is shown in the trace tree pane title bar 308. The tracetree 330 is organized in a standard tree structure and the developer 112can click on the tree control buttons to collapse or expand the view offunctions belonging to lower hierarchical levels. Clicking on a functionor a source line in the trace tree pane 310 causes the trace detail pane316 and the source pane 318 to change to display information relevant tothe function. Selecting a function in the trace tree 330 and pressingthe delete button on the keyboard removes the selected function from thetrace. This is equivalent to filtering the function out of the trace.

[0128] The trace data is written to a buffer in memory called the tracebuffer 105, and from there either displayed in the trace tree pane 310(when performing an online trace) or written to a log file (whenperforming a remote trace). The trace buffer 105 is organized as acircular buffer of fixed size. The size of the trace buffer 105 can beset by the developer 112. When the trace buffer 105 is fill, new tracerecords overwrite the oldest records contained in the buffer 105. Oneskilled in the art will recognize that other buffering methods can beused without changing the scope of the present invention. For example,the trace information could be stored in a buffer, which simply addedtrace records without overwriting old records. In a preferredembodiment, loss of old data is acceptable because, when the client 102malfunctions, the developer 112 is usually interested in the most recentrecords prior to the malfunction. Thus, there is usually little need tokeep all of the records, especially the oldest ones. The size of thetrace buffer 105 is set so that it will be big enough to hold a largenumber of records without consuming too many system resources.Typically, 20,000 to 40,000 records are kept.

[0129] When the trace buffer 105 is written to a log file 122, the tracerecords are preferably appended to the end of the log file 122. In a logfile, old records are not deleted, and the trace size is limited only bythe available disk space.

[0130] Alternatively, when tracing online, the trace tree is actually animage of the trace buffer 105. Because of this, the trace tree will notdisplay more records than the trace buffer 105 contains, so old recordsare deleted (“scrolled out” of the display). The rows counter at the topof the trace tree pane 310 indicates the number of records in the tracebuffer 105 and the number of rows in the trace tree. Because the buffer10S is circular, the number of rows in the trace tree 330 continuouslygrows during the beginning of the tracing process until the buffer wraps(typically 20,000 to 40,000 records). Thereafter, the number remainsapproximately at the same level as old records are overwritten with newrecords. The exact number of records that are kept in the trace buffer105 depends on the size of the trace records. The size of each tracerecord is determined by the TCI options specified by the developer 112.For example, if the developer 112 requires tracing of “this” classmembers, the size of the records will increase and the number of recordsin the buffer will decrease.

[0131] The analyzer 106 and the agent 104 can trace a multi-threaded andmulti-processed client 102. When tracing a multi-threaded process,different threads are separated from each other in the trace tree pane310 by a thread caption bar 320. For multi-process applications, similarhorizontal bars, called process caption bars (not shown), separate tracelines belonging to different processes. The thread caption bar 320 andthe process caption bar separate the trace tree 330 into sections. Thesecaption bars represent a context switch in the application, betweenthreads and between processes. Process caption bars are similar to thethread caption bar 320, therefore any future mention of threads alsoapplies to processes in multi-process applications.

[0132] The thread caption bar 320 contains a name field, a process IDnumber field, and a thread ID number field 321. Within the trace tree330 itself, there is an up arrow at the top of each section, and a downarrow at the bottom of each section. Clicking the up arrow causes thedisplayed trace tree 330 to jump to the previous point in the trace tree330 where the thread gained control. Clicking the down arrow causes thedisplayed trace tree 330 to jump to the next point in the trace tree 330where the thread gains control. The trace tree 330 also provides anexpand/collapse control button 326 to allow the developer 112 to expandand collapse the current thread view. The trace tree pane 310 alsoprovides a vertical scroll bar for scrolling up and down through thetrace tree 330. When the trace tree pane 310 is scrolled up or down to asection containing functions of lower hierarchical levels, the portionof the trace tree 330 displayed in the window is shifted leftwards. Thedepth of this shift, with respect to the first function called in theprocess, is indicated by a stack level indicator 328 appearing in arectangle in the upper left corner under the thread caption bar 320 (asshown in FIG. 3).

[0133] The trace detail pane 316 shows available details describing thefunction selected in the trace tree view. FIG. 11 shows a trace detailpane 1116 that displays a C++ class having several members and methods,a class derived from another classes, and classes as members of a class.The trace details are displayed in a trace detail tree 350 which is ahierarchical tree structure. A right arrow 351 in the trace detail pane316 marks where the function is called. A left arrow at the bottom ofthe detail tree 350 marks where the function returned to its caller.Some of the data that can be displayed (such as the arguments) are onlydisplayed if an option is selected in the advanced trace options. If anargument in the call window of a function is of the aggregate type, theargument's components will be displayed beneath the right arrow 351 inthe form of a hierarchy tree. If an argument is of the pointer type, andpointers were selected in the advanced trace options, then the valuedisplayed in the trace detail tree 350 will be that of the data to whichthe pointer points. However, for pointer fields that reside withinarguments, only the address contained in the pointer will be displayed.In other words, in the preferred embodiment, the pointer isde-referenced only for the first level arguments. One skilled in the artwill understand that other pointers could be de-referenced as well, andthat the trace detail tree 350 could display the value pointed to byarguments deeper than the first level.

[0134] In one embodiment, the trace detail pane 316 also shows timestamps. The time stamps display the time that a function is called andthe time that the function returns to its caller.

[0135] If the argument is an array of known size, then the elements ofthe array will be displayed. If the array size is unknown, then thevalue displayed is the value of the first array element. If the argumentis of character pointer type, then the string value is displayed. If theargument is numeric, then the decimal, hex, or decimal and hex valuesare displayed, depending on the selection made in the advanced traceoptions. Right-clicking the mouse when it points in the trace detailpane 316 displays a popup menu which allows the developer 112 to selecthow numeric arguments are displayed (as decimal, hex, or decimal and hexvalues).

[0136] The source pane 318 shows the source code for the selectedfunction or source line selected in the trace tree 330. The source codelisted in the source pane 318 is automatically scrolled to the locationof the selected object, if possible. The line in the source code isdisplayed in bold and is pointed to by an arrow. The analyzer 106 looksfor the source file in the current directory and in the directoryindicated in the .PDB file. If the source file is not found, the sourcepane remains blank. In this case, the developer 112 can change thesource file search path in order to display the source code. To changethe source file path the developer should select a function in the tracetree 330, then right-click in the source pane to open a pop-up menu, andthen select the “Source Location” from the pop-up menu. Alternatively,the developer 112 can add additional source directories and removesource directories by selecting the “Options” command from the “View”menu on the menu bar 304. Source file paths can also be removed.

[0137] The analyzer 106 provides several features which make it easierto analyze trace information and pinpoint a bug in the client 102. Thesefeatures can be used both while performing an online trace and whileviewing trace information from a remote log file 122. Analysis featuresinclude: saving trace information to a log file 122; printing the tracetree 350; searching for trace elements; locating a function in the traceoptions window 500; filtering the trace information; adding, editing,deleting and locating bookmarks; clearing the trace tree pane; anddisplaying multiple windows. Additional features available for onlinetracing include saving trace information to the log file 122.

[0138] The “Find” button on the toolbar 306 is used to initiate a searchfor an element in the trace tree 330. Clicking the Find button opens a“Find what” dialog box in which the developer 112 can enter a searchtext string. The find what dialog provides a “Find Next” button to starta search for the occurrence of the specified search text. The firstoccurrence of the specified text is highlighted in the relevant pane.Functions in the source code displayed in source pane 318 can be locatedin the trace options dialog 500 by right-clicking inside the source codein the source pane 318. The right-click opens a pop-up menu. Thedeveloper then selects a “Locate in Trace Options” command from thepop-up menu to open the trace options window 500. The trace optionswindow 500 will open with the desired function displayed andhighlighted.

[0139] The trace filter previously described in the text relating toFIG. 5 is a tool that enables the developer 112 to select the functionsto trace. When using the trace filter to change the display whileperforming an online trace, the trace continues in the background, andwhen the developer 112 closes the trace options window 500 the newfilter is applied to the display in the trace window 300. The developer112 can also use the trace options window 500 to change the displaywhile performing an off-line trace. This enables the developer 112 tofilter out traced elements and display a subset of the tracedinformation. The information contained in the log file is not modified,only the display changes.

[0140] A bookmark allows the developer 112 to mark trace lines(functions or source lines) in the trace tree 330. The developer 112 canalso edit the name of a bookmark or delete the bookmark it as required.Bookmarks are inserted in the trace tree 330 by using the bookmarkbutton on the toolbar 306. Bookmarks allow easy jumps to the bookmarkedelement. To insert a bookmark in the trace tree 330, the developer will:select the trace line (a function or source line in the trace tree 330)to mark; press the bookmark button to open the bookmark window; type thebookmark name in the bookmark widow; and click the OK button. A waivingflag icon 332 appears on the left of the bookmarked trace line in thetrace tree 330. The bookmark name is displayed whenever the cursor isplaced over the bookmarked line. To change a bookmark name, thedeveloper 112 repeats the steps to create a bookmark. To delete abookmark from the trace tree 300, the developer 112 can press a deletebutton on the bookmark window. The “Goto Bookmark” command from the“Edit” menu is used to go to a bookmark in the trace tree 330.

[0141] Multiple instances of the analyzer 106 can be opensimultaneously. Each instance can define different filter options foreach window. This feature is especially useful for multi-threadedapplications, where it is convenient to observe each thread in aseparate window.

[0142] The analyzer 106 provides for starting and stopping of an onlinetrace. All trace points are disabled when tracing is stopped. Stop ishelpful if the trace adversely influences the application performanceand it appears that the subsequent operations in the client 102 are notrelevant to the problem being analyzed. The Start/Stop Tracing button onthe toolbar 306 is used to toggle tracing on and off. Tracing is stoppedor restarted as specified. When tracing is stopped, the boundaries ofthe lost tree portion appear in the trace tree pane 330 as a tear 1202,as shown in FIG. 12. When tracing is resumed, the trace tree 330continues under the tear 1202.

[0143] Internal Implementation Details of the BugTrapper System

[0144] The sections that follow discuss various internal operational andimplementation details of the agent 104, the analyzer 106, the tracelibraries 124, 125, and how these elements interact with the client 102and the operating system.

[0145] The Attaching Mechanism

[0146] One aspect of the present invention is the attaching mechanismused by the BugTrapper to collect trace information. With traditionaltools, it was necessary to manually enter trace points in theapplication's source code, or at a minimum, even if trace points wereautomatically added to the source, to re-compile the source code. WithBugTrapper, tracing is accomplished by attaching to the memory image ofthe application (i.e., the copy of the executable code that is loadedinto RAM or other memory for execution). There is no need to enter tracepoints into, or to otherwise modify, the source, object, or executablefiles of the client 102 application. No special tracing version of theclient 102 is needed, and the client 102 need not be written in anyspecial manner. Attaching to the client 102 in memory allows functioncalls, returns, and other source lines to be traced. The attachingmechanism also allows for the tracing of any executable, includingoptimized (release) builds, multi-threading and multi-processes,longjumps, signals, exceptions, and recursions.

[0147] The BugTrapper client-side trace library 125 is attached to theclient 102, in part, by modifying certain executable instructions of thememory image of the client 102. This process is generally called“executable code instrumentation,” or simply “instrumentation.” Theinstrumentation process is performed such that the functionality of theclient 102 is preserved. Because the instrumentation is made only on thememory image, there is no need to pre-process or modify the source codeor executable files of the client 102. Use of the client-side tracelibrary 125 provides significant advantages over the prior art byeliminating the need for context switches when debugging a program.Context switching has the effect of significantly slowing down the rateof execution. The tracing implementation provided by BugTrapper cantherefore be used to study the real time behavior of a program anddetect bugs resulting from such behavior. Although one skilled in theart will recognize that the present invention can advantageously be usedwith any operating system, a preferred embodiment runs under theWindows-NT/2000, Windows-95/98 and similar operating systems supplied byMicrosoft Inc. The following description of the internal details of theBugTrapper will thus be described in terms of the Windows-NT/2000/95/98operating systems with the understanding that the invention is notlimited to said systems.

[0148] The trace libraries 124, 125 include a number of callablefunctions (discussed below). By using the callable functions, and systemfunctions provided by the Win32 API (application program interface), thetrace libraries performs two major tasks: (1) attaching specialtyfunctions to application, and (2) tracing the execution of theapplication's executable code. Both of these tasks are describedseparately below. The agent-side trace library 124 is primarilyresponsible for attaching the client-side trace library 125 to theclient 102. The agent-side trace library 124 also provides communicationwith the client-side library 125. The client-side trace library 125 isprimarily responsible for placing data in the trace buffer 105. In thefollowing description, the term “client process” is used to refer to theexecutable code of the client 102 that has been loaded into a memoryspace for execution. BugTrapper refers both to BugTrapper Agent orBugTrapper Analyzer, depending whether it is operating in the Onlinemode or the Remote mode.

[0149] The act of attaching to a currently running process is known as aProcess Attach. The act of attaching to a new process, during thecreation of the new process, in order to trace the new process from itsstart is known as a Creation Attach. In a Creation Attach it isdesirable to pause the client 102 process as close as possible to itsentry point so that virtually all of the functions executed by theclient 102 will be traced.

[0150] In the Windows-NT/2000 compatible and Windows-95/98 compatibleoperating systems, each process resides at a distinct location or“address space” in memory. A DLL, such as the client-side trace library125, which resides in another address space, cannot simply be loadedinto the same address space as the client process. To overcome thislimitation, BugTrapper forces the client process to load the client-sidetrace library 125 DLL (using a process called injection) into theprocess space of the client process.

[0151] Attaching to a Client Running Under Windows-NT/2000

[0152] In a preferred embodiment, the injection process for ProcessAttach in Windows-NT is accomplished by using the CreateRemoteThread( )function of the Win32 API, to create a remote thread in the clientprocess and to force the newly created thread to run code in the clientprocess. The code that is run by the remote thread is a copy of aninjection function copied to the remote thread using the Win32 APIWriteProcessMemory( ) function. The Process Attach involves thefollowing sequence of events shown in FIG. 13 beginning with a procedureblock 1302 where the function inst_attach( ) of the tracing library iscalled in BugTrapper, using the process ID (“PID”) of the client(client) process as an argument. The function inst_attach( ) performsthe following operations:

[0153] 1) It obtains a handle to the client process using OpenProcess();

[0154] 2) It allocates memory in the client process's address spaceusing the Win32 API function VirtualAllocEx( );

[0155] 3) It copies the code for the injection function and othervarious data (including the full path of the Trace Library) onto theallocated memory space using the WriteProcessMemory( ) function; and

[0156] 4) It creates a new thread in the client process withCreateRemoteThread( ).

[0157] The new thread created in step 4 starts executing at the addressto which the injection function was previously copied in step 3. Theprocedure then advances from the procedure block 1302 to a procedureblock 1304 where the injection function starts running in the new threadof the client process. Using data passed to it via other parts of thememory space, the injection function loads the client-side trace library125.

[0158] The procedure advances from the procedure block 1304 to aprocedure block 1306 where the client-side trace library 125 runs in thecontext of the new thread while the instrumentation is taking place. Theclient-side trace library 125 communicates with BugTrapper (i.e., theagent-side trace library 124), handling commands, and actuallyperforming the instrumentation.

[0159] The procedure advances from the procedure block 1306 to aprocedure block 1308 where the client-side trace library 125 exits, andthe injection function destroys its own thread and stops executing bycalling the ExitThread( ) function. Unlike other debuggers thatterminate the debugged process on exit, here the client 102 continues torun, without any substantial alteration to the functionality of theclient 102.

[0160] Creation Attach is accomplished under Windows-NT by creating theclient process in a suspended state, by using the CREATE_SUSPENDED flagin the CreateProcess( ) function. In this case, the previously describedprocedure cannot be used, since none of the system DLLs in the clientprocess have been initialized. In particular, since KERNEL32.DLL is notloaded, the client-side trace library 125 cannot be loaded. The presentattaching procedure overcomes this difficulty by performing thefollowing attaching procedure, which begins at a procedure block 1402shown in FIG. 14.

[0161] To attach to a new client 102, the attaching procedure begins inblock 1402, in which the client process is created in a CREATE_SUSPENDEDstate. The attaching procedure then advances to a procedure block 1404.In the procedure block 1404, BugTrapper makes a call to theinst_prepare( ) of the agent-side trace library 124. The inst_preparefunction, using WriteProcessMemory( ) and VirtualAllocEx( ), allocatesmemory in the client process and copies a small assembly language codesegment into the allocated space. The procedure then proceeds to aprocedure block 1406 where the inst_prepare function overwrites theentry point of the client executable in the client process with a jumpinstruction to the new assembly code. The attaching procedure thenadvances to a procedure block 1408 wherein the inst_prepare functionallows the client process to resume, and thereby start theinitialization process for the client process. After all DLLs areinitialized, including the client-side trace library 125, executioncontinues to the entry point of the client executable, which nowcontains a jump to the new assembly code. When the jump occurs, theattaching procedure advances from the procedure block 1408 to aprocedure block 1410. In the procedure block 1410, the assembly coderestores the original client entry point, and suspends the clientprocess. At this point, the client process is suspended without runningany executable code, but is past the initialization stage. The attachingprocedure then advances to a procedure block 1412.

[0162] In the procedure block 1412, BugTrapper can now call inst_attach() to attach to the client process and start instrumenting it. When theattaching procedure is complete, it can allow the client process toresume. The assembly code simply jumps directly is back to the originalentry point of the client 102, and execution of the client 102 startswith the proper instrumentation.

[0163] Attaching to a Client Running Under Windows-95/98

[0164] In Windows-95/98, Process Attach and Creation Attach areimplemented in a manner different from the Windows-NT/2000 mannerdiscussed above because the CreateRemoteThread API call is not supportedin this operating system.

[0165] Creation Attach under Windows-95/98 exploits the fact thatprocess initialization starts from a known entry point of kernel32.dll.BugTrapper creates the client process in the suspended mode and thencalls to the inst95_attach function. This function performs thefollowing sequence of operations:

[0166] 1) It initializes the communication channel for IPC with theclient process.

[0167] 2) It copies various data (such as the Injection Function codeand the path for the client-side trace library 125) into the client'saddress space, using WriteProcessMemory function.

[0168] 3) It initializes a shared heap memory.

[0169] 4) It copies onto the heap a small piece of assembler code (apatch) that executes the jump to the function that creates thread in theclient process

[0170] 5) It copies the injection function itself

[0171] 6) It patches the entry point of kernel32.dll so that the entrypoint points to the shared heap address where the assembler code islocated. Because of the lack of “Copy on Write” mechanism in Windows-95,this patching applies also to the client process.

[0172] 7) It resumes the main thread of the client process.

[0173] 8) In the client process, the entry point of kernel32.dll iscalled and, thus, the applied patch starts execution. The patch performsthe following operations:

[0174] a) The patch removes the patch applied on the kernel32.dll entrypoint and restores the original kernel32.dll code.

[0175] b) The patch creates a new thread, which runs the injectionfunction.

[0176] c) The injection function loads the client-side trace library125.

[0177] d) The injection function initializes the client-side tracelibrary 125 and the communication channel in the client process so thatthe two trace libraries 124, 125 can communicate.

[0178] 9) If inst95_attach returns successfully, then the initialinstrumentation of the client process is done and the tracing begins.

[0179] During a Process Attach, BugTrapper calls theinst95_attach_to_running_process function in the agent-side tracelibrary 124. The inst95_attach_to_running_process function executes thefollowing sequence of operations:

[0180] 1) It initializes the communication channel for IPC with a clientprocess

[0181] 2) It calls a function create_remote_thread (not to be confusedwith the CreateRemoteThread API call in Windows-NT), that performs thefollowing operations:

[0182] a) It allocates memory on the shared heap.

[0183] b) It copies various data (such as the Injection Function codeand the path for the client-side trace library 125) onto the heap

[0184] c) It finds a valid thread handle from the client process.

[0185] d) It suspends the valid thread

[0186] e) It sets the single step flag in the valid thread context

[0187] f) It releases the valid thread

[0188] A device driver, which will be further described below,intercepts the INT 1 interrupt that is caused by the first executedinstruction of the above mentioned valid thread. Upon receiving theinterrupt, the device driver sets the instruction pointer to the startaddress of the injection function that was copied onto the shared heap,and clears the single step flag in the valid thread context. Afterclearing the single step flag, the driver proceeds as if the interruptwas successfully handled, and returns the control to Windows-95.

[0189] Since the instruction pointer now points to the injectionfunction, the injection function starts to execute in the context of theclient process. The injection function continues as in the case ofCreation Attach described above and creates a new thread thatsubsequently performs the loading of the client-side trace library 125into the address space of the client 102.

[0190] In order to leave the interrupted valid thread intact, theinjection function executes the breakpoint instruction, whichimmediately causes an INT 3 interrupt that is intercepted by the devicedriver. The device driver restores the thread context that was storedimmediately after the thread was suspended and then the device driverreturns the control to Windows-95.

[0191] Tracing Execution

[0192] The trace function involves tracing the execution of theinstrumented client process and reporting certain events to BugTrapper.The client-side trace library 125 accomplishes the tracing function byusing breakpoints, and by reporting information concerning the status ofthe client process upon reaching the breakpoints.

[0193] During the execution of the client process, the execution traceis stored within a fixed size circular trace buffer 105 in memory. Inthe remote mode of operation the contents of the trace buffer 105 arecopied to a trace log file 122. The trace log file 122 thus containstrace information that reflects a time window ending with the writing ofthe log file 122. The length of this time window is generally dependentupon the size of the trace buffer 105. In a preferred embodiment, thetrace buffer 105 is small enough to allow the trace log file 122 to besent to the developer's site using standard email programs. In theonline mode of operation, the display is constantly being updatedmirroring the trace buffer 105. The displayed information can also besaved to a log file 122 and later re-displayed.

[0194] After the client process has been attached, the process oftracing the execution of the client 102 involves the steps of installingbreakpoints, triggering breakpoints, and catching breakpoints.Breakpoints are installed by overwriting the target address of theassembly instruction to be traced with an INT 3 instruction, occupying asingle byte of space. The original byte at that address, along withother information, is stored in a data structure created by theagent-side trace library 124. The data structure, which describes alltrace points, is preferably a hash table comprising a correspondingarray of records for each hash value. The hashing is implemented withthe target address as a parameter, allowing for a very fast searchingfor information concerning a trace point by using its address.

[0195] Breakpoints are triggered whenever the target address getsexecuted. When the target address is executed, the breakpointinstruction generates an INT 3 interrupt. On Windows NT/2000 thisinterrupt is handled by the Windows-NT/2000 kernel-mode handler. Thekernel-mode handler transfers the execution to the user-mode routineKiUserExceptionDispatcher inside NTDLL.DLL (the system DLL). TheKiUserExceptionDispatcher routine handles the task of locating acorresponding exception filter for the particular kind of exception.

[0196] Catching of breakpoints occurs within the context of the client102. With standard debuggers, control would pass to the debugger processat this point. BugTrapper, takes a new approach, eliminating the needfor context switching to properly trace the execution (for betterperformance). Since no context switching takes place, control remainswith the client 102.

[0197] When the client-side trace library 125 is initially loaded, apatch is applied to the KiUserExceptionDispatcher function, having theeffect of forcing a call to a function in the client-side trace library125 before processing the exception. This function (the BugTrapperexception handler), determines whether the breakpoint occurred as aresult of the tracing or for another reason. An exception that is notthe result of tracing (i.e., no trace point has been installed at thistarget address) will result in a return of execution toKiUserExceptionDispatcher. When an exception is the result of thetracing, the handler notifies the appropriate routines in the tracinglibrary 125 and defers the breakpoint, thereby allowing the originalinstruction at the target address to execute.

[0198] To defer a breakpoint, the original byte at the target address isrestored, returning execution while setting a trap flag in the FLAGSregister of an x86 processor. The trap flag causes an INT 1 interrupt tooccur as a result of the execution of the original instruction. Thisinterrupt is also treated as an exception, eventually reflecting intothe BugTrapper exception handler. The handler restores the breakpointinstruction at the target address and returns for second time, allowingthe client process code to continue running as if nothing happened.

[0199] In Windows 95/98, interception of the INT3 and INT1 interrupts isdone by a device driver. The driver registers its interrupt handler forINT1 and INT3 interrupts. When the interrupt handler is called, itchecks to see if the interrupt occurred in the context of the clientprocess. If the interrupt occurred in the client process, then theinterrupt handler changes the instruction pointer of the thread to theaddress of a routine in the client-side trace library 125, and passesback on its stack any data needed by the function (such as threadcontext). After this function handles the trace point, it triggers anadditional INT 3 interrupt that is recognized by the device driver. Thedevice driver acts as if the interrupt has been successfully handled,causing the traced thread to continue execution. When the device driverrecognizes that an interrupt has occurred not in the context of theclient process, then the device driver passes the interrupt to theoperating system interrupt handler (thus not affecting the normalbehavior of other programs in the system or the operating systemitself).

[0200] When tracing a plain source line (e.g., not a function entry orexit point), the client-side trace library 125 inserts data in the tracebuffer to indicate that a trace point has been reached. When reaching afunction entry trace point (apart from writing data to the trace buffer)a special mechanism is used because tracing of information regardingboth the entry to and exit from the function is desired. This ispreferably accomplished by modifyfing the return address of thefunction. The return address is located on the stack. The originalreturn address is saved and a new return address point is inserted. Thenew return address points to a special assembly stub inside theclient-side trace library 125. Therefore, when the function returns theassembly stub is called. The stub reports to the client-side tracelibrary 125 function that the function has exited, and the client-sidetrace library 125 writes this trace point to the trace buffer. The stubthen jumps to the real return address of the function.

[0201] In certain environments it is possible for a function to beentered but not properly exited. The function ceases running (with itsstack erased and execution continuing elsewhere), but never returns toits caller. Therefore, for tracing purposes, it never returned to theBugTrapper assembly stub. For example, this would happen when a C++exception occurs inside the a function and the exception handler at anouter function instructs the function generating the exception to exit,or when the setjmp( )/longjmp( ) functions are used in C/C++ programs.To detect and trace such events, the microprocessor's stack pointerregister (ESP) is checked whenever a trace point triggers to determinewhether any functions have exited. The stack pointer normally growsdown. Its position is registered at the entry of each function togetherwith the above-mentioned return address. If the stack pointer has movedto a higher point than that at entry, the function is deemed to haveexited, and the client-side trace library 125 reports that the functionhas exited. Several different redundant checks are also performed toensure the reliability of this mechanism.

[0202] Additional Tracing and Attaching Features

[0203] The BugTrapper attaching technology can be used withmulti-process and multi-threaded applications. Every trace record isassociated with a process and a thread. Stack information is separatelykept for each context. Therefore, the BugTrapper can trace two or moreclient executables at the same time. This allows BugTrapper to displayany context switches between the processes and threads of the client(s)102.

[0204] The BugTrapper supports the tracing of Dynamically LinkedLibraries (DLLs), including all sub-formats such as OCX, Active-X,drivers (DRV), etc. The tracing of DLLs is accomplished by analyzing theclient 102 process to find the DLLs it uses, and by displaying thesource structures of the DLLs to the user. The user can then specifytrace points within the DLLs as is done for any other executable. Whenapplying trace points to a DLL, BugTrapper finds the base address intowhich the DLL was loaded, and uses the address to translate theaddresses in the debug information to actual addresses in the runningimage.

[0205] The BugTrapper also supports the tracing of DLLs for which nodebug information is available, such as system DLL's. The tracing ofsuch DLLs is accomplished by tracking the exported functions used by theDLLs. This is done by analyzing the DLL exported function table in theclient 102 to retrieve information concerning the exported functionnames and addresses.

[0206] The BugTrapper also supports tracing of sub-processes. Forexample, when a first process P1 and a second process P2 are listed inthe executable pane 314, and P1 spawns P2 as a sub-process, thenBugTrapper will start tracing P2. This is done by tracing theCreateProcess function in all of the traced processes, even if thedeveloper 112 did not specify tracing the CreateProcess function. Bytracing CreateProcess, BugTrapper will know that PI spawned asub-process, and BugTrapper can identify that the sub-process name (P2in the present example) is listed in the executable pane 314. When thesub-process is created, BugTrapper will attach to the sub-process usingthe “Creation Attach” mechanism discussed above.

[0207] Variables and memory values can also be traced by BugTrapper. Theuser can view variable values as in an ordinary debugger. The variablesmay include function arguments, the C++ “this” pointer, function returnvalues, local variables, global variables, static variables, etc. Thedata to which a pointer is pointing can also be traced. This informationcan be viewed for optimized builds, which cannot always be done bycurrent debuggers. Tracking of variables in memory is accomplished byfirst analyzing the debug information to find the address (global,static, stack, or dynamic address) of the variable and the data itholds. BugTrapper then uses these addresses to dump to the trace logfile 122 the memory content according to variable size.

[0208] When the traced application crashes, BugTrapper records the pointwhere the failure occurred, even if the line was not specified in theTCI file 120. All stack variables are saved by using the Win32 debug APIand the system library IMAGEHLP.DLL.

[0209] Interprocess Communication

[0210] Communication between the client-side trace library 125 and theagent-side trace library 124 (in the agent 104 or the analyzer 106) canbe divided into two categories. Category one comprises normal messages.Category two comprises trace data.

[0211] Category one communication is accomplished using standard WindowsInterProcess Communication (IPC) primitives, such as shared memory topass data, and semaphores to signal and synchronize. Normal messagesinclude commands sent to the client-side trace library 125 such as,start trace function at a given address, or suspend tracing. Normalmessages also include notifications sent by the client-side tracelibrary 125, such as creation of a sub-process or run-time loading of aDLL.

[0212] The trace data itself is sent using a different mechanism,because of the quantity of data. Trace data comprises: function calls(including the assembly address of the called function); values ofparameters for each call; function return values (including functionaddress); tracing of other source lines specified in the TCI file 120(including their address); variables value at these addresses; etc. Thetrace records are written to a shared memory area called the tracebuffer 105, and from there either displayed in the BugTrapper userinterface by the analyzer 106 (when performing an online trace) orwritten to a log file by the agent 104 (when performing a remote trace).

[0213] The client-side trace library 125 and the agent-side tracelibrary 124 prevent simultaneous access to the trace buffer usingstandard locking mechanism such as Mutex (in Windows-95) or InterlockedFunctions (in Windows-NT). For performance reasons, when collectingtrace data, the client-side trace library 125 preferably only writestrace data to the trace buffer 125 in shared memory. The client-sidetrace library 125 preferably performs no I/O to the disk or to thedisplay. Disk I/O and display updates are done later by the agent 104 orthe analyzer 106. This reduces the performance penalty imposed on theclient 102.

[0214] Indexing of the Trace Data

[0215] In order to process scrolling of the trace tree efficiently,there should desirably be direct access to records in the trace buffer105 or trace log file 122. Serial access would be inefficient because itwould require a search for the needed data in the trace buffer 125 uponevery tree scroll operation. To facilitate direct access, an index ismaintained with every trace tree window. The index contains thelocations of all of the “function call” records in the trace buffer,which are included in the filter of the corresponding window in whichthe trace tree is displayed. In addition to the location information,some user-interface related information such as whether the record isinvisible (“collapsed”) is kept. The developer 112 can “collapse”(remove from display) part of a tree which is located under a specificcall in the tree hierarchy. Collapsing part of a tree influences thecurrent displayed portion of the tree.

[0216] For example, assuming that only one record is displayed on a treehaving a scroll bar, if the tree includes records (1 2 3 4 5) and thescroll bar is located at the middle, record 3 should be displayed.However, if records 2 and 3 are collapsed (leaving 1 4 5), then record 4should be displayed. For a tree including more than a million lines,including thousands of collapsed records, the calculation of thelocation of the displayed portion of the trace data might be atime-consuming task. In order to do this efficiently, the analyzer 106holds, together with the above-mentioned calls index, a special arraySA, where SA[i] contains the number of visible records from recordnumber 1000*i to 1000*(i+1). Use of the SA array greatly speeds up thetask of locating desired trace information. For example, assume thatrecords 500-550 are invisible (collapsed by the developer 112) and thatthe vertical scroll bar position is 1500. In this case SA[0]=950 and theappropriate record is 1550. The analyzer 106 calculates this numberdirectly, without the need to scan the whole calls index:1000-SA[0]+1500(scroll bar position)=1550. The SA array provides forvery fast vertical scrolling. The SA array is updated each time a newrecord is read from the trace buffer 105 or the log file 122, or whenthe developer 112 collapses or expands some of the trace tree. Ingeneral, when the analyzer 106 draws a trace tree, it performs thefollowing steps: (1) lock the trace buffer 105; (2) scan new records andupdate the calls index and the SA array; (3) read and analyze therecords that must be shown; (4) merge the records with the debuginformation 121 and create strings for each record; (5) draw the page;and (6) unlock the shared memory trace buffer 105. Note that whenreading data from a trace log file 122 only steps 3-5 are performed,since steps 1, 2, and 6 are unnecessary.

[0217] Visual Problem Monitor

[0218] In one embodiment, a visual problem monitor assists a supporttechnician (e.g., a help desk person, a system administrator, etc.) inremotely analyzing problems by gathering run-time information about:program execution; interaction between the executing program and theoperating system; system resources; user actions; file operations;failed operations and screen output. For example, file interactions, DLLloading and/or registry accesses can be monitored non-intrusively. Thesupport technician can remotely view user interactions with the programand corresponding reactions by the system. This mitigates (or in somecases eliminates) the “questions and answers” game that supporttechnicians usually play with users in order to understand what the userdid and what happened on the customer's PC.

[0219] By using the dynamic analysis capabilities of the visual problemmonitor, the support technician can check the parameters that influencedthe program more effectively than by scanning static data gathered fromthe user's computer. For example, there is no need to check the versionsof all the DLL's in the user's computer or to dump the entire registryfrom the user's computer. Rather, by using the visual problem monitor,the support technician can choose to view only the DLL's used by thetraced program, or the registry entries or files accessed by the tracedprogram. The visual problem monitor helps the support technicianunderstand the details of problems in cases where programs producecryptic messages and in cases where the programs simply crash withoutany specific error message.

[0220] In one embodiment, the visual problem monitor uses the executablehooking technology described above. The hooking technology allows tracepoints to be added to a running program while preserving the program'soriginal operation. Support and help desk technicians can use thistechnology for tracing software interaction with the system and otherAPI functions, without access to the source code, and therefore it doesnot require extra work to be done by the software vendors. In oneembodiment, tracing of API functions using BugTrapper hooking technologyrequires one standard TCI file for all Windows applications.

[0221]FIG. 15 is a block diagram showing the components of a visualproblem monitor system 1500. The visual problem monitor system 1500includes an information-gathering module 1501 that runs on the user'scomputer along with a client program 1509, and an information-displaymodule 1502 that runs on the support technician's computer. Theinformation-gathering module 1501 includes an Application ProgrammingInterface (API) event hooking module 1506, a message event hookingmodule 1507, and a program code event hooking module 1508. The API eventhooking module 1506, the message event hooking module 1507, and theprogram code event hooking module 1508 are controlled by, and send datato, an event processing engine 1503. The event processing engine 1503stores information gathered from the program 1509 and the user'scomputer system in a log file 1505. The event processing engineretrieves commands and event tracing instructions from an eventknowledge base 1504.

[0222] System interaction tracing allows support personnel to gatherinformation about behavior of the program 1509, and to diagnose sourcesof errors. The dynamic tracing mechanism provided by the visual problemmonitor system 1500 provides logging the following Windows API functionsand GUI events:

[0223] Calls of Windows API functions related to:

[0224] File and Directory operations

[0225] Registry operations

[0226] Environment variables

[0227] Spawned sub-processes

[0228] Loaded DLL's and other system components

[0229] IPC (semaphores, shared memory, messages, etc.)

[0230] WinSocket, RPC

[0231] SQL calls and related database operations

[0232] Keyboard input events

[0233] Mouse movement and mouse clicking events

[0234] Graphical screen capture of application windows updates

[0235] Calls to internal functions and code lines of applications. (Thisan optional functionality for software producers, depending onavailability of source code and debug information as described in thetext accompanying FIGS. 1-14 above.)

[0236] The events are synchronized by time and logged into the log file1505. Several mechanisms can be used for gathering information for eventlogging. Monitoring of Win32 API calls can be done using any of thefollowing tools and techniques:

[0237] The hooking and tracing techniques described in connection withFIGS. 1-14 above.

[0238] The Microsft Detours library

[0239] DLL redirection

[0240] The Microsoft Standard debug API

[0241] Different techniques can be used to capture user interactions andscreen updates, including those used in such programs and products as:

[0242] Screen-capture tools (e.g. Lotus SreenCam)

[0243] Remote PC administration tools (e.g. Norton PCAnywhere, NetvisionOpSession, AT&T WinVNC)

[0244] One embodiment of the visual problem monitor system 1500 uses thefollowing logging mechanisms: (1) the hooking mechanism described aboveis used to gather event data for logging of Windows API functions; and(2) hooking to Windows messages related to keyboard and mouse events andscreen updates is used to gather event data for logging of GUIinteractions and screen capture. In one embodiment, standard datacompression techniques are used for compression of the visualinformation and other records in the log file 1505.

[0245] More specifically, the following system interaction functions aretraced by the visual problem monitor system 1500:

[0246] File operations

[0247] Open/Close/Lock/Unlock

[0248] Create/Delete

[0249] Read/Write/Copy

[0250] Find

[0251] Get disk free space

[0252] Directory operations (SetCurrentDirectory, RemoveDirectory etc.)

[0253] Tracing of these operations allows detecttion of problems suchas:

[0254] Attempts by the program 1509 to access a non-existing file

[0255] File operations by the program 1509 that violate file accesspermissions

[0256] File operations by the program 1509 to a full disk

[0257] File operations by the program 1509 to a file that is locked byanother application

[0258] Environment values:

[0259] Registry operations (For example, the information-gatheringmodule 1501 can detect when the application 1509 tries to read anon-existent key, a key has a wrong value, a key points to a missingfile, etc.)

[0260] Environment variables

[0261] INI files (e.g. Profile Strings)

[0262] Loaded DLLs: For example, the information-gathering module 1501can detect loaded DLL name, version, date, location on disk, etc. andpinpoint to a missing DLL or a DLL having an incorrect version number.)

[0263] Requested services/drivers: The information-gathering module 1501can collect information on missing, incorrect, and misbehaved NTservices and drivers.

[0264] Spawned sub-process: The information-gathering module 1501 cancollect information regarding spawned executables (e.g., executablename, version, id, etc.). The information-gathering module 1501 can alsolog information regarding unsuccessful attempts to create a sub-process(e.g. because the executable was not found, etc.)

[0265] Crash information: The information-gathering module 1501 collectsinformation regarding the name of an executable (or DLL) where a crashoccurred, contents of the stack at the time of the crash, memory status,sequence of function calls before the crash, etc.

[0266] Communication Information

[0267] Event Log

[0268] Inter-Process Communication (e.g., Common Object Model (COM)messages, Distributed COM (DCOM) messages, semaphores, shared memory,messages, etc.)

[0269] Open DataBase Connectivity (ODBC) events

[0270] Networking events (e.g., Winsocket messages, Remote ProcedureCall (RPC) information, etc.)

[0271] In one embodiment, the information collected by theinformation-gathering module 1501 and stored in the log file 1505 ispassed to a remote support technician in order to allow the supporttechnician to resolve software support issues related to the program1509. The log file 1505 created by the information-gathering module 1501is transferred to the information-display module 1502 running on thesupport technician's computer. The log file 1505 can be transferredusing email, WEB access, network file transfer protocols and the like.

[0272] The support technician can select between two modes of operation.In a first mode, the information-gathering module 1501 is continuouslyactive. When a problem occurs, the log file 1505 is created. If the userchooses to call the help desk, the support technician can obtain the logfile 1505 and use it for analysis. In a second mode, theinformation-gathering module 1501 is active on demand. In the secondmode, when the user calls the help desk, the support technicianactivates the information-gathering module 1501 on the user's computerand receives the log file 1505 using network communication protocols. Inone embodiment, the support technician receives the log file 1505 byusing a TCP/IP-based communication protocol.

[0273] The information-display module 1502 is used by the supporttechnician to view the data from the log file 1505 (as shown in FIG. 16below). The information-display module 1502 allows the supporttechnician to filter the display to show only specific types of eventsor the whole scenario. In one embodiment, suspicious events (e.g.loading a non-existing DLL) are highlighted.

[0274]FIG. 16 shows the graphical user interface (GUI) 1600 provided tothe support technicianby the information-display module 1502. The GUI1600 includes a window 1609 that lists executable modules (by file name)that comprise the program 1509 and the processes created by theexecutable modules. A window 1608 lists module information including theDLLs (with version numbers) used by the executable module. A window 1605(shown as a tab) provides crash information in the event of a crash ofthe program 1509. A window 1607 (shown as a tab) lists environmentinformation including environment variables, registry variables, and INIvariables used by the program 1509. A window 1607 (shown as a tab) listssystem information about the user's computer (that is, the computerrunning the program 1509 that is being traced). A window 1603 listsevent information (by process) in chronological order. A window 1606(shown as a tab) provides options to allow the support technician todefine filters for the event information shown in the window 1603. Thefilters allow the support technician to specify which types of eventsare traced and displayed in the window 1603. A window 1602 shows screencaptures from the user's computer. A group of video controls 1601 allowsthe support technician to “play the movie” of screen capture eventsobtained from the user's computer using standard video-type controlssuch as stop, play, rewind, fast forward, next frame, etc.

[0275] The GUI 1601 provides verbalization of data from the log file1505. Events logged in the log file 1505 are displayed as textualstrings in plain English, or another natural language in the window1603. Thus the support technician and PC users need relatively lessprogramming experience to use the system 1500. In one embodiment, thescreen captures shown in the window 1602 are replayed synchronously withthe even displays provided by the GUI 1601. This allows the supporttechnician to see what was happening on the user's screen when variousevents occurred in the user's system. Thus, for example, screen capturesin the window 1602 are replayed synchronously with the replay of eventsin the window 1603, 1608, etc. The support technician can use thecontrols 1601 to control (e.g., pause, rewind, etc) the animatedscreen-capture display (in the window 1602) and the animated eventdisplays provided by the GUI 1601.

[0276] In one embodiment, the log file 1505 is an extension of the tracelog file 122 shown in FIG. 1B. The log file 1505 includes recordsrelated to logging of screen updates and user interaction with theapplication as follows:

[0277] vlSetFramebufferFormat (corresponding to a Set Framebuffer Formatoperation)

[0278] vlFramebufferUpdate (corresponding to a Framebuffer Updateoperation)

[0279] vlMouseMove (corresponding to a Mouse Move operation)

[0280] vlMouseClick (corresponding to a Mouse Click operation)

[0281] vlKeyPressure (corresponding to a Key Pressure operation)

[0282] vlNumBookmark (corresponding to a Numeric Bookmark)

[0283] vlStrBookmark (corresponding to a String Bookmark)

[0284] vlProcessAttached (corresponding to an Attach Process operation)

[0285] vlProcessDetached (corresponding to a Detach Process operation)

[0286] vlProcessTerminated (corresponding to a Terminate Processoperation)

[0287] In one embodiment, the recording of GUI-related objects is basedon intercepting Windows messages by the message event hooking module1507. The message event hooking module 1507 is supplied with anAttach(ThreadIdent) method that sets a hooking function with help of theWindows SetWindowsHookEx( ) function and creates an additional thread.The Hook( ) function in the current thread analyzes intercepted messagesand window regions that are re-drawn. As a result, special messages aregenerated and directed to the additional thread for transforming intorecords and writing into DirectAccessStream objects.

[0288] The vlFramebufferUpdate records are generated to save bitmaps ofinvalidated regions of windows. In one embodiment, bitmaps are createdby reading video memory using Microsoft DirectX methods. In oneembodiment, each created bitmap stores only a minimal rectanglecorresponding to the window update region.

[0289] A significant number of software problems arise from the deletionor corruption of critical files. In many cases the diagnostic messagesissued by programs do not provide enough information fortroubleshooting. The visual problem monitor system 1500 provides moreinformation about the missing file problem. Consider, for example, asimple example with Acrobat Reader. If font file Zd______ .pfb ismissing, then the Acrobat Reader is not started and the user gets thecryptic message “No Zapff)ingbats or Multiple Master fonts found.” Aftergetting this cryptic message, the user has to guess what happened withthe application or the system and where it is possible to find thesuddenly lacking fonts and how to restore the system to working order. Atypical solution in such a case is to reinstall the whole application.Since the visual problem monitor system 1500 tracks file accessoperations, the visual problem monitor system 1500 can easily detectthat the program lacks the file Zd______.pfb in the directoryC:Acrobat3ReaderFonts, thus providing a better way for the problemresolution.

[0290] DLL management represents a significant challenge for Windowsusers. The following scenario illustrates the problem. Assume thatinstallation of a vendor's program overrides the system DLL mapi32.dllwith an older version without any warning message. As a result, afterinstalling the vendor's program the Microsoft Notepad+ program fails tosend any mail and gives the user a nonspecific message “SendMail failedto send message.” Since the visual problem monitor system 1500 tracksthe use of DLLs, visual problem monitor system 1500 can show a supporttechnician that a function from mapi32.dll made a call to a nonexistentexecutable mapisrv.exe (the problem lies in MAPI version mismatch). Inone implementation, visual problem monitor system 1500 includes a DLLmanagement module that monitors DLL-related operations and detectstypical DLL problems.

[0291] In one embodiment, the visual problem monitor and the BugTrappercan be used in concert to locate problems in software. Supporttechnicians typically analyze visual problem monitor trace informationwithout access to the source code. When the problem is caused by a bugin the source code of the client program, the trace log is transferredto a software developer. Software developers can open visual problemmonitor trace logs using the BugTrapper analyzer and by accessing sourcecode can view the calls of traced API functions in the source code. Theescalation workflow is illustrated in the flowchart 1700 shown in FIG.17. The flowchart 1700 begins at a process block 1701 where a visualproblem monitor agent (comprising the event processing engine 1503 andone or more of the hooking modules 1506-1508) and the event knowledgedatabase 1504 (an API-level TCI file) is sent to a user (e.g., acustomer) site. The process block 1701 typically happens in response toa user complaint (regarding a software problem) to a support site. In asubsequent block 1702, the user generates a trace log file 1505 byrunning (or attempting to run) the malfunctioning program client inconnection with the visual problem monitor agent. In a subsequent block1703, the trace log file 1505 is transferred to the support site (e.g.by using the Internet, computer network, etc.). In a subsequent processblock 1704, the trace log file 1505 is analyzed by using the visualproblem monitor. If the reason for the software malfunction is found byusing the visual problem monitor, then the process advances to a processblock 1706 where the user is informed of the nature of the problem and,typically, how to correct the problem; otherwise, the process advancesto a process block 1707. In the process block 1707, the trace log file1505 is transferred to a developer (e.g., at a developer site). In asubsequent process block 1708, the developer uses the BugTrapper sourcecode analyzer (with application source code inputs from a process block1709) to search for program bugs in the malfunctioning application.

[0292] Other Embodiments

[0293] Although the present invention has been described with referenceto a specific embodiment, other embodiments will occur to those skilledin the art. It is to be understood that the embodiment described abovehas been presented by way of example, and not limitation, and that theinvention is defined by the appended claims.

What is claimed is:
 1. A software system that facilitates the process ofidentifying and isolating software execution problems within a programwithout requiring modifications to the executable of the client program,said system comprising: an information-gathering module that monitorsselected events occurring during execution of the client program andstore data describing said events in a log file, saidinformation-gathering module configured to monitor API events, messageevents, and program events, said information-gathering module furtherconfigured to obtain screen captures during execution of the clientprogram, said information-gathering module configured to connect to saidclient program at runtime by hooking an in-memory executable image ofsaid client program; and an information-display module that displaysinformation from said log file, said information-display moduleconfigured to list events logged in said log file, saidinformation-display module further configured to display screen capturesobtained by said information-gathering module, said information-displaymodule configured to run on a different computer than saidinformation-gathering module, thereby allowing remote troubleshooting ofsaid client program.
 2. The software system of claim 1, wherein saidinformation-gathering module monitors file access operations.
 3. Thesoftware system of claim 1, wherein said information-gathering modulemonitors and highlights failed system interactions
 4. The softwaresystem of claim 1, wherein said information-display module displaysscreen captures synchronized with logged events.
 5. The software systemof claim 1, wherein said information-display module replays screencaptures in sequence.
 6. The software system of claim 1, wherein saidinformation-display module replays screen captures in sequence toproduce a screen capture sequence, said information-display module alsoshowing event information in sequence to produce an event informationsequence, said event information sequence synchronized with said screencapture sequence.
 7. The software system of claim 1, wherein saidinformation-gathering module monitors attempts by said client program toaccess a windows registry.
 8. The software system of claim 1, whereinsaid information-gathering module monitors use of DLLs.
 9. The softwaresystem of claim 1, wherein said information-gathering module monitorsattempts by said client program to spawn a subprocess or create athread.
 10. The software system of claim 1, wherein saidinformation-gathering module monitors database operations.
 11. Thesoftware system of claim 1, wherein said information-display moduleincludes filters to control displaying of events in said log file. 12.The software system of claim 1, wherein said information-gatheringmodule monitors interprocess communication performed by said clientprogram.
 13. The software system of claim 12, wherein said interprocesscommunication includes communication using COM.
 14. The software systemof claim 12, wherein said interprocess communication includescommunication using DCOM.
 15. The software system of claim 12, whereinsaid interprocess communication includes communication using semaphores.16. The software system of claim 12, wherein said interprocesscommunication includes communication using shared memory.
 17. Thesoftware system of claim 12, wherein said interprocess communicationincludes communication using network protocols.
 18. A method forremotely troubleshooting problems occurring when trying to execute aclient program on a remote computer, comprising: loading a clientprogram on a remote computer to create an in-memory executable image ofsaid client program; loading an information-gathering module on saidremote computer, said information-gathering module configured to connectto said client program at runtime by hooking said in-memory executableimage, said information-gathering module configured to monitor selectedevents occurring during execution of said client program and store eventdata describing said events, said information-gathering moduleconfigured to monitor API events, message events, and program events,said information-gathering module further configured to obtain screencaptures during execution of said client program; loading aninformation-display module on a second computer; and sending said eventdata to said information-display module, said information-display moduleconfigured to receive said event data and list events logged in saidevent data, said information-display module further configured todisplay screen captures obtained by said information-gathering module.19. The method of claim 18, wherein said information-gathering modulemonitors file access operations.
 20. The method of claim 18, whereinsaid information-gathering module monitors attempts by said clientprogram to access non-existent files.
 21. The method of claim 18,wherein said information-gathering module monitors attempts by saidclient program to access protected files.
 22. The method of claim 18,wherein said information-gathering module monitors attempts by saidclient program to write to a full disk.
 23. The method of claim 18,wherein said information-gathering module monitors attempts by saidclient program to access locked files.
 24. The method of claim 18,wherein said information-gathering module monitors attempts by saidclient program to access one or more registry entries.
 25. The method ofclaim 18, wherein said information-gathering module monitors use of oneor more DLLs.
 26. The method of claim 18, wherein saidinformation-gathering module monitors attempts by said client program tospawn a subprocess.
 27. The method of claim 18, wherein saidinformation-gathering module monitors attempts by said client program tocreate a thread.
 28. The method of claim 18, wherein saidinformation-gathering module monitors interprocess communicationperformed by said client program.
 29. The method of claim 18, furthercomprising the step of defining one or more filters to control how saidinformation-display module displays said event data.
 30. The method ofclaim 18, wherein said information-display module creates a first windowto display a list of events monitored by said information-gatheringmodule, and wherein said information-display module creates a secondwindow to display screen capture information from said remote computer.31. The method of claim 30, wherein said information-display modulecreates a third window to display a list of DLLs used by said clientprogram.
 32. A system for remotely troubleshooting problems occurringwhen trying to execute a client program on a remote computer,comprising: means for monitoring events and capturing screenshotsoccurring during execution of a client program and storing datadescribing said events, said events including API events, messageevents, and program events; means for hooking said means for monitoringto an in-memory executable copy of said client program; and aninformation-display module that displaying said data describing saidevents, said information-display module configured to list events inchronological order, said information-display module further configuredto display screen captures obtained by said information-gatheringmodule.